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ABSTRACT 


This  work  presents  an  adaptive  approach  to  the  problem  of  esti¬ 
mating  a  sampled,  scalar-valued,  stochastic  process  described  by  an 
initially  unknown  parameter  vector.  Knowledge  of  this  quantity  com¬ 
pletely  specifies  the  statistics  of  the  process,  and  consequently  the 
optimal  estimator  must  "learn"  the  value  of  the  parameter  vector.  In 
order  that  construction  of  the  optimal  estimator  be  feasible  it  is 
necessary  to  consider  only  those  processes  whose  parameter  vector  comes 
from  a  finite  set  of  a  priori  known  values.  Fortunately,  many  prac¬ 
tical  problems  may  be  represented  or  adequately  approximated  by  such  a 
model . 

The  optimal  estimator  is  found  to  be  composed  of  a  set  of  elemental 
estimators  and  a  corresponding  set  of  weighting  coefficients,  one  pair 
for  each  possible  value  of  the  parameter  vector.  This  structure  is 
derived  using  properties  of  the  conditional  mean  operator.  For  gauss- 
markov  processes  the  elemental  estimators  are  linear,  dynamic  systems, 
and  evaluation  of  the  weighting  coefficients  involves  relatively  simple, 
nonlinear  calculations.  The  resulting  system  is  optimum  in  the  sense 
that  it  minimizes  the  expected  value  of  a  positive-definite,  quadratic 
form  in  terms  of  the  error  (a  generalized  mean-square-error  criterion). 
Because  the  system  described  in  this  work  is  optimal,  it  differs  from 
previous  attempts  at  adaptive  estimation,  all  of  which  have  used  approxi¬ 
mation  techniques  or  subopt imal,  sequential,  optimization  procedures. 

Two  examples  showing  the  improvement  of  an  adaptive  filter  as 
compared  to  a  conventional  filter  are  presented  and  discussed. 
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I.  INTRODUCTION 


A.  OUTLINE  OF  THE  PROBLEM 

This  investigation  concerns  the  optimal  estimation  of  a  sampled, 
scalar-valued,  gauss-markov  (briefly,  a  gaussian  process  which  possesses 
a  generalized  Markov  property  -  see  definition  in  Chapter  IV)  stochastic 
process  when  certain  parameters  of  the  process  are  initially  unknown. 

It  is  assumed  that  the  parameters  come  from  a  set  that  contains  a  finite 
number  of  possibilities  which  are  known  a  priori.  The  stochastic 
process  is  thus  represented  by  a  set  of  elemental  stochastic  processes 
(one  corresponding  to  each  possible  combination  of  parameters),  a  switch 
that  is  permanently  but  randomly  connected  to  one  of  the  elemental 
stochastic  processes,  and  a  set  of  a  priori  probabilities  for  the  set 
of  switch  positions.  The  elemental  stochastic  processes  are  represented 
as  the  outputs  of  linear  dynamic  systems  excited  by  gaussian  processes 
whose  time-displaced  samples  are  independent,  i.e.,  white  noise.  The 
stochastic  processes  may  or  may  not  be  stationary.  In  this  analysis  the 
expression  "to  estimate"  will  mean  either  to  predict  (extrapolate), 
filter,  or  interpolate.  An  optimal  estimate  will  be  defined  as  an 
estimate  that  minimizes  a  generalized  mean-square-error  performance 
criterion  given  the  available  data. 

The  above  structure  permits  optimal  estimates  to  be  formed  in  the 
following  general  cases : 

1.  The  covariance  matrix  of  the  process  is  Initially  unknown  but 
must  be  one  of  a  finite  number  of  matrices. 

2.  The  mean  value  function  of  the  process  is  initially  unknown  but 
must  be  one  of  a  finite  collection  of  deterministic  functions. 

3.  The  message  component  of  the  process  is  Initially  unknown  but  is 
formed  by  the  proper  initial  conditions,  which  are  assumed  to  be 
gaussianly  distributed,  on  one  of  a  finite  number  of  possible 
free,  linear,  dynamic  systems. 

4.  Any  realizable  combination  of  the  above  cases. 

Engineering  examples  of  some  of  the  above  situations  are  listed 
below. 
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A  space  probe  is  to  telemeter  some  analog, sampled  data  at  a  pre¬ 
determined  time;  however,  it  is  not  certain  that  this  data  will  be 
transmitted  because  it  is  possible  that  the  space  probe's  sensor  received 
no  input  or  possibly  the  transmitter  failed.  This  represents  a  situa¬ 
tion  in  which  the  covariance  matrix  of  the  process  is  unknown  but  has 
only  two  possible  forms — the  covariance  matrix  of  the  noise  alone  or 
the  covariance  matrix  of  signal  plus  noise.  This  problem  is  just  the 
Wiener  filtering  problem  with  the  added  generality  that  it  is  possible 
that  no  signal  is  present. 

Consider  a  control  problem  such  as  anti-aircraft,  for  example,  in 
which  for  optimal  control  it  is  necessary  to  predict  the  future  value 
of  a  signal  input  that  is  corrupted  by  additive  gaussian  noise.  A  class 
of  inputs  might  be  described  by  the  outputs  of  a  free,  linear,  dynamic 
system  for  all  possible  initial  conditions.  This  form  of  input  descrip¬ 
tion  has  been  proposed  by  Kalman  and  Koepcke  [Ref.  1].  A  given  control 
system  might  very  well  have  to  respond  optimally  to  various  classes  of 
inputs,  e.g.,  different  targets.  One  then  might  ask  for  the  best  pre¬ 
diction  of  the  signal  input,  given  that  it  came  from  one  of  a  finite 
number  of  known  classes. 

Another  example  concerns  the  optimal  filtering  of  a  signal  process 
with  known  covariance  matrix  that  is  subject  to  noise  that  possesses 
one  of  two  possible  known  covariance  matrices.  This  situation  could 
occur  in  a  communication  or  tracking  problem  in  which  an  enemy  might  or 
might  not  attempt  to  Jam  with  noise  of  known  covariance  matrix. 

B.  PREVIOUS  WORK 

Kalman  [Ref.  2]  has  considered  the  optimal  prediction  and  filtering 
of  sampled  gauss-markov  stochastic  processes  when  the  parameters  of  the 
process  are  known.  Rauch  [Ref.  3]  has  extended  this  analysis  to  include 
interpolation  and  to  handle  the  case  in  which  the  parameters  are  random 
variables,  independent  from  one  sample  point  to  the  next;  the  mean  values 
and  variances  of  these  random  variables  are  known.  Because  of  the  time 
independence  of  the  random  parameters,  it  is  impossible  to  learn  the 
parameters,  and  adaptive  estimation  will  offer  no  inprovement  over 
ordinary  linear  estimation. 
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Balakrishnan  [Ref.  4]  has  developed  an  approximately  optimal  (with 
respect  to  a  mean-square-error  performance  criterion)  computer  scheme 
for  predicting  noise-free  (i.e.,  pure  prediction  with  no  smoothing) 
stochastic  sequences.  No  assumptions  are  made  on  the  statistics  and 
hence  the  result  is  very  powerful  for  the  very  specific  problem.  However, 
a  number  of  approximations,  whose  significance  is  difficult  to  assess,  are 
made.  Weaver  [Ref.  5]  has  considered  the  adaptive  filtering  problem  in 
which  the  noise  spectrum  is  known,  but  the  signal  spectrum  must  be 
learned  with  time.  In  the  limit,  the  data  processing  proposed  by 
Weaver  is  optimal  but,  in  the  transient  mode  of  learning,  it  is  sub- 
optimal.  If  signal  or  noise  processes  are  nonstationary,  the  data 
processor  will  always  be  learning  and  always  be  suboptimal.  Shaw 
[Ref.  6]  has  considered  the  dual  filtering  problem  in  which  the  signal 
process  varies  randomly  between  two  possible  bandwidths. 

The  work  in  the  present  investigation  represents  an  extension  of 
the  state-transition  method  of  analysis  utilized  by  Kalman  and  Rauch 
to  the  problem  of  estimation  when  the  parameters  of  the  stochastic 
process  are  initially  unknown  and  must  be  learned. 

C.  OUTLINE  OF  NEW  RESULTS 

The  solution  to  the  problem  of  the  optimal  estimate  of  a  sampled, 
scalar-valued,  gauss-markov  stochastic  process  with  unknown  parameters 
(which  must  come  from  a  finite  set  of  known  values)  is  derived  in  this 
investigation.  This  solution  is  to  be  contrasted  with  the  usual  non- 
optimal  adaptive  estimater  proposed  in  the  literature.  Typically,  it 
is  suggested  that  an  optimal  estimate  of  the  statistics  of  the  process 
be  made  and  then  the  optimal  estimator  be  designed  as  if  this  best 
estimate  were  indeed  true.  This  sequential  optimization  procedure  may 
converge  in  the  limit  with  time  to  the  true  optimal  estimator.  However, 
for  any  finite  amount  of  observed  data,  this  approach  may  not  be  the 
overall  optimum  procedure.  Because  of  the  many  finite-duration  esti¬ 
mation  problems— e.g. ,  trajectory  estimation— the  advantage  of  the 
optimal  adaptive  estimator  in  the  transient  mode  is  important  practically 
as  well  as  theoretically. 
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Since  the  optimal  estimator  would  utilize  the  correct  parameters, 
if  known,  they  must  be  "learned."  Thus  one  is  faced  with  a  situation 
in  which  the  optimal  estimator  must  adapt  itself  as  it  "learns"  the 
true  values  of  the  parameters  of  the  process. 

Although  the  elemental  stochastic  processes  are  assumed  gaussian, 
the  probability  law  of  the  resultant  stochastic  process  conditioned  on 
the  past  data  is  nongaussian.  Shaw  [Ref.  6]  also  found  this  unfortunate 
result.  Consequently,  linear  data  processing  will  not,  in  general,  be 
optimal.  Usually,  nonlinear  data  processing  is  quite  undesirable  since 
it  can  involve  a  large  amount  of  calculation.  Fortunately,  by  adopting 
the  proper  viewpoint,  it  can  be  shown  that,  for  the  problem  discussed 
in  this  investigation,  the  nonlinear  data  processing  is  of  a  simple 
form.  By  adopting  the  conditional -expect at  ion  point  of  view  as  advocated 
by  Kalman  [Ref.  7],  it  is  proven  in  the  main  text  that  the  optimal  esti¬ 
mate  is  just  the  sum  of  the  elemental  optimal  estimates  weighted  by  the 
conditional  probabilities  that  the  particular  set  of  parameters  is  true. 
Consequently,  the  only  nonlinear  processing  consists  of  calculating 
probabilities  that  will  be  used  as  weighting  coefficients.  Furthermore, 
the  major  portion  of  this  calculation  is  performed  by  the  elemental 
optimal  estimators  or  has  been  performed  previously  in  order  to  build 
them.  Therefore,  the  adaptive  estimator  proposed  is  quite  feasible 
while  being  optimal  even  in  the  transient  or  learning  mode. 

The  adaptive  estimator  described  in  this  dissertation  is  shown  to 
be  useful  for  a  class  of  linear-dynamic,  quadratic -cost,  stochastic 
control  problems.  If  the  observations  of  the  state  vector  of  the  plant 
are  corrupted  by  gaussian  noise  of  unknown  covariance  matrix,  then  it  is 
necessary  to  construct  this  adaptive  estimator  in  order  to  implement  the 
optimal  control  law. 

By  utilizing  a  theorem  from  Braverman  [  Ref.  8]  it  is  proved  that 
the  adaptive  estimator  will  converge  with  probability  one  to  the  optimum 
estimator  based  upon  the  true  parameters  if  the  elemental  stochastic 
processes  are  ergodlc.  If  the  elemental  processes  are  nonstationary, 
the  weighting  coefficients  may  not  converge.  Nevertheless,  the  esti¬ 
mate  formed  by  the  procedure  described  in  this  investigation  is  optimum 
given  the  available  data. 
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The  above  results  are  applied  to  the  Wiener  filtering  problem  in 
which  the  presence  of  the  signal  component  is  uncertain.  The  perform¬ 
ance  of  the  adaptive  procedure  outlined  above  is  compared  with  that  of 
the  conventional  Wiener  filter  based  on  the  assumption  that  the  signal 
is  present.  As  a  second  example,  a  similar  filtering  problem  with 
certain  message  presence  but  random  jamming  presence  is  considered. 

The  steady-state,  mean-square  error  of  the  adaptive  filter  is  much  less 
than  that  of  a  conventional  filter  designed  on  the  basis  of  no  jamming, 
even  though  the  jamming  is  assumed  to  have  only  one  chance  in  eleven  of 
occurring . 
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II.  STATEMENT  OF  THE  PROBLEM 


It  is  desired  to  form  an  estimate  of  a  sampled-data,  gaussian, 
message  process,  possibly  corrupted  by  additive  noise,  so  that  the 
estimate  minimizes  a  generalized  mean-square-error  performance  measure. 
The  quantity  being  estimated  may  be  either  past,  present,  or  future 
values  (or  perhaps  some  linear  function)  of  the  message  process.  The 
observable  process  is  assumed  to  be  a  sampled-data,  scalar-valued, 
gaussian,  random  process  whose  mean  value  vector  and/or  covariance 
matrix  is  unknown  but  is  selected  from  a  finite  set  of  known  vectors 
and/or  matrices.  Thus,  the  parameters  describing  the  process  are  ele¬ 
ments  of  a  finite,  known,  parameter  space. 

A.  MODEL  OF  THE  PROCESS 

The  observable,  scalar-valued  stochastic  process  (z(t):  t  =  1, 

2,  . ..)  can  be  considered  to  be  a  composite  stochastic  process  since 
it  can  be  constructed  from  elemental  stochastic  processes  (z^t):  t  =  1, 
2,  ...;  i  =  1 ,  ...  L) ,  as  illustrated  in  Fig.  1.  The  various  elemental 
processes  represent  and  exhaust  the  set  of  possible  parameter  values  for 


FIG.  1.  MODEL  OF  OBSERVABLE  STOCHASTIC 
PROCESS. 
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the  observable  process.  The  switch  is  randomly  connected  to  one  of 
the  L  possible  switch  positions  and  remains  there  throughout  the 
duration  of  the  process.  Let  0^  denote  that  the  switch  is  in  position 
j,  i.e.,  z (t )  =  zj(0*  The  a  priori  probabilities,  {P(a1): 
i  =  1,  ...  L},  of  the  switch  being  in  each  of  the  L  positions  are 
assumed  to  be  known.  Since  the  observable  process  (given  that  the 
switch  is  in  position  j)  is  a  gaussian  random  process,  each  of  the 
elemental  processes  must  be  gaussian  also.  Each  elemental  process  is 
considered  to  be  composed  of  a  message  component  { y ^ ( t ) :  t  =  1, 

2,  ...),  and  an  additive  gaussian  noise  component,  {n  (t):  t  =  1, 

2,  ...}.  Later  it  will  be  assumed  that  the  elemental  processes  are 
gauss-markov  processes  since  this  will  greatly  simplify  the  calculations; 
however,  at  this  point  no  such  assumption  is  necessary. 

B.  EXAMPLES  OF  PROCESSES  WITH  UNKNOWN  PARAMETERS 

Numerous  examples  of  processes  with  unknown  parameters  exist  in 
nature.  Unfortunately,  unless  the  unknown  parameters  come  from  a  finite 
set  of  known  possible  parameter  values,  a  prohibitive  amount  of  data 
processing  is  required  to  calculate  the  optimal  estimates.  Fortunately, 
many  engineering  problems  meet  the  requirement  that  they  have  a  finite 
number  of  possible  parameter  values;  many  others  may  be  adequately 
approximated  by  that  assumption.  Three  examples  of  the  former  situation, 
which  were  briefly  mentioned  in  the  Introduction,  are  described  below. 

The  space-probe-telemetry  problem  may  be  represented  by  the  stochastic 
model  shown  in  Fig.  2.  The  first  elemental  process  is  composed  of  both 
message  and  noise  processes,  while  the  second  consists  of  the  noise 
process  alone.  Consequently,  throughout  the  duration  of  the  process 
the  received  signal  is  either  message  plus  noise  or  noise  alone.  The 
optimal  filter  must  learn  which  is  the  case. 

The  random-jamming  problem  can  be  modeled  as  illustrated  in  Fig.  3. 

The  first  elemental  process  is  composed  of  a  signal  process  plus  a  noise 
process  representing  receiver  noise.  The  second  consists  of  the  same 
signal  process  plus  a  different  noise  process,  which  represents  both 
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FIG.  2.  MODEL  OF  RANDOM  PROCESS  FOR 
WHICH  THE  PRESENCE  OF  THE  MESSAGE 
COMPONENT  IS  UNCERTAIN. 


FIG.  3.  MODEL  OF  RANDOM  PROCESS  FOR 
WHICH  THE  PRESENCE  OF  THE  JAMMING 
COMPONENT  IS  UNCERTAIN. 


receiver  noise  and  an  independent,  additive,  gaussian,  jamming  process 
of  known  covariance  matrix. 

A  model  for  the  multi-class  target  prediction  problem  is  given  in 
Fig.  4.  The  L  elemental  processes  represent  the  different  classes  of 
targets  to  be  tracked.  The  noise  processes  are  assumed  to  be  the  same 
for  each  elemental  process,  while  the  message  processes  differ  in  a 
manner  adequate  to  represent  the  dynamics  of  various  classes  of  such 
targets  as  aircraft  and  missiles. 


FIG.  4.  MODEL  FOR  MULTI-CLASS  TARGET 
PREDICTION  PROBLEM. 
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III.  FORM  OF  THE  OPTIMAL  ESTIMATOR 


The  basic  form  of  the  optimal  estimator  will  be  derived  in  this 
chapter.  Two  subsequent  chapters  will  consider  in  detail  the  required 
linear  and  nonlinear  data  processing,  respectively. 

The  performance  measure  used  is  a  generalization  of  mean-square 
error,  the  most  common  criterion  in  use.  This  generalization  is  neces¬ 
sary  since  the  quantity  or  state  of  nature,  denoted  uo,  being  estimated 
may  be  vector-valued.  Thus,  in  general,  a>  will  be  a  vector  quantity, 
although  it  is  to  be  understood  that  in  a  particular  case  this  vector 
may  be  a  scalar.  Similarly,  matrix  quantities,  which  appear  later,  may 
be  either  matrix-,  vector-,  or  scalar-valued. 

Specifically,  the  optimal  estimate  cn  of  some  state  of  nature  co 
will  be  defined  as  the  value  of  a>  t ,  which  minimizes  the  following 
quadratic  form 


-  “e8t)T  Q(cu  ‘  CUest)lZt)* 

where  Q  is  a  symmetric,  positive  definite  matrix,  the  superscript  T 
denotes  the  transpose  of  a  vector,  and  E{*|Zt)  denotes  the  conditional 
mean  operator  given  the  available  data  vector  Z^  defined  at  time  t 
as 

Z*  £  (z(l),  z{2),  •••,  z(t)}. 

Utilizing  the  trace  identity 

UTAV  =  tr {V  •  UTA), 

where  U  and  V  are  vectors,  A  a  matrix,  and  tr{’)  denotes  the 
trace  operator  upon  a  matrix,  one  may  rewrite  by  completing  the  square 
the  above  quadratic  form  as 

Kst  "  ^“est  “  +  trttK<u  "  ®  * 
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where 


cd  ^  E{a>|  Zt }  and  4  E{cu  •  a>T|  Z% ) . 

Since  only  the  first  term,  which  is  a  positive  definite  form,  depends 

A 

on  cu  ,  the  optimal  estimate  cu  is  simply  the  conditional  mean  of 

6S  t 

cu.  Furthermore,  this  estimate  also  minimizes  the  trace  of  the  covariance 
matrix  of  the  error  (the  criterion  used  by  Rauch  [Ref.  3]),  as  can  be 
established  by  using  the  above  trace  identity  and  letting  Q  =  I,  the 
identity  matrix. 

An  interesting  property  of  the  conditional  mean  will  be  used  to 
derive  the  form  of  the  optimal  estimator. 

A.  DERIVATION  OF  THE  FORM  OF  THE  OPTIMAL  ESTIMATOR 

In  the  conventional  estimation  problem  it  is  desired  to  form  an 
optimal  estimate  25  of  some  state  of  nature  cn — e.g.,  in  the  filter¬ 

ing  problem,  cu  =  y(t).  Since  the  optimal  estimate  is  the  conditional 
mean,  one  calculates 

CD  =  \  0)  p(o>|  Z  )  dO)  (3.1) 

''n  r 

where  &  4  space  of  all  <x> 

p(ai|Zt)  4  the  conditional  probability  density  function  of  co  given 
the  data  vector  Z^ . 

Either  the  conditional  density  is  known  or  it  is  possible  to  calculate 
it,  since  the  statistics  of  the  random  processes  are  presumed  known. 
Furthermore,  in  the  usual  estimation  problem  the  conditional  density  is 
gaussian  and  consequently  linear  data  processing  is  optimum. 

When  the  estimation  problem  Involves  an  observable  process,  whose 
probability  structure  would  be  completely  specified  by  the  knowledge  of 
an  unknown  parameter  vector  a,  additional  analysis  is  necessary.  For 
example,  even  though  the  elemental  random  processes  are  gaussian,  the 
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conditional  density  p(co|Z£)  will  be  nongaussian  in  general,  as  will  be 
shown  later  in  the  chapter.  Consequently,  nonlinear  calculations  will  be 
necessary  to  obtain  the  conditional  mean.  The  solution  may  be  found  by 
recognizing  that  the  conditional  density  of  to  may  be  obtained  from  the 
Joint  conditional  density  of  cd  and  a  by  integration  over  A,  the 
space  of  all  possible  values  of  a.  Thus,  Eq.  ( 3 . 1 )  becomes 

a>  =  \  cc  \  p(w,  a  I  Z  )  da  den, 


which  may  be  rewritten,  by  definition  of  p(o)|a,  ) ,  as 


zt )  p(a|  z  )  da  dco. 


Interchanging  the  order  of  integration,  which  is  permissible  so  long  as 
the  integrand  is  absolutely  integrable  [Ref.  9] ,  and  defining  the  con¬ 
ditional  estimate 


A 


CO 


co  p(co|  a,  zt )  dco 


leads  to 


2>  =  \  o>(a)  p(a|  z  )  da.  (3.2) 

JA  1 

Thus,  the  optimal  estimate  is  formed  by  taking  the  complete  set  of 
conditional  estimates,  weighting  each  with  the  conditional  probability 
that  the  appropriate  parameter  vector  is  true,  and  integrating  over  the 
space  of  all  possible  parameter  values.  It  should  be  noted  that  no 
restrictive  assumptions  have  been  made  about  the  probability  laws  in  the 
above  derivation.  For  example,  in  the  special  case  that  a  is  described 
by  a  discrete  probability  law,  then  Eq.  (3.2)  may  be  rewritten  (if  one 
has  an  aversion  to  the  Dirac  delta  function)  as 

u>  =  V^)  P(at|Zt).  (3.3) 

X 
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For  computational  reasons,  implementation  of  Eq.  (3.3)  will  be  easiest 
when  A  is  a  finite  set  indexed  on  a  small  number  of  integers. 

It  is  now  apparent  that  Eq .  (3.3)  is  directly  applicable  to  the 
estimation  of  the  process  represented  in  Fig.  1,  since  a  one-to-one 
correspondence  may  be  made  between  the  switch  position  and  the  parameter 
vector  that  Specifies  the  statistics  of  the  process.  Subsequently,  0^ 
will  be  used  to  denote  both  a  particular  parameter  vector  and  the  corre¬ 
sponding  switch  position.  A  block  diagram  of  the  optimal  estimator  for 
the  stochastic  process  represented  in  Fig.  1  is  shown  in  Fig.  5.  Since 
the  weighting  coefficients  are  probabilities  and  hence  must  range  between 
zero  and  one,  they  may  be  implemented  by  potentiometers  as  shown.  Figure 
5  tacitly  implies  that  the  quantity  being  estimated,  o>,  is  a  scalar 
quantity.  If  o>  were  a  vector  quantity,  multiple  ganged  potentiometers 
might  be  desirable. 


H(a,) 


H(aj ) 


ELEMENTAL 

ESTIMATORS 


COEFFICIENTS 


FIG.  5.  FORM  OF  OPTIMAL  ADAPTIVE  ESTIMATOR. 
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If  as  time  progresses  it  is  possible  to  learn  which  elemental 
stochastic  process  is  being  observed,  it  is  then  intuitively  reasonable 
to  expect  the  optimal  estimator  to  converge  to  the  appropriate  Wiener 
filter  for  that  process.  In  terms  of  the  block  diagram,  this  means  that 
the  weighting  coefficient  corresponding  to  the  true  switch  position  will 
converge  to  one  while  all  the  rest  will  converge  to  zero.  Under  the 
proper  assumption  about  the  elemental  processes,  this  will  be  shown  to 
be  the  case,  in  Chapter  V. 

Equations  (3.2)  and  (3.3)  will  have  the  most  practical  significance 
when  the  conditional  estimates  (cu(a):  all  a  €  A)  are  linear  in  the 
observed  data  vector  .  In  this  case,  the  problem  of  constructing 
an  optimal  estimator,  which  requires  nonlinear  data  processing,  is 
factored  into  the  calculation  of  a  set  of  linear  estimates  and  the  non¬ 
linear  calculation  of  a  set  of  weighting  coefficients.  Fortunately, 
under  the  proper  assumptions,  the  calculation  of  the  weighting  coeffi¬ 
cients  is  not  difficult.  One  may  regard  the  optimal  estimate  u>  as 
being  constructed  from  a  linear  combination  of  vectors  in  the  space  of 
linear  estimates  of  cu.  The  nonlinear  calculations  involved  are  solely 
in  the  determination  of  the  optimum  values  of  the  weighting  coefficients 
used  in  this  linear  combination.  In  the  problem  statement  given  in 
Chapter  II,  the  elemental  processes  were  described  as  gaussian  random 
processes  and,  consequently,  the  conditional  estimates  are  linear  in 
the  observed  data.  Therefore,  the  problems  considered  in  this  dis¬ 
sertation  may  be  factored  as  described  above. 

Earlier  in  this  section  it  was  claimed  that  the  conditional  density 
of  the  state  of  nature  w  would  be  nongaussian  in  general.  This  fact 
is  demonstrated  by  reasoning  similar  to  that  used  in  deriving  the  form 
of  the  optimal  estimator.  Thus,  for  the  finite  possible  parameter 
vector  case, 


p(u>jzt)  =  ^  p(co|ai(  zt)  •  p(a1|zt). 


i=i 
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Inasmuch  as  the  densities  {p(6o|a^,  Z^):  i  =  1,  2,  .  . .  L)  are  gaussian, 
the  resultant  conditional  density  p(m|  ) ,  being  a  linear  combination 
of  gaussian  densities,  is  in  general  nongaussian.  Exceptions  occur 
when  P(ai|Zt)  =  1  for  some  i  or  when  ai  =  Oj  for  all  i,j. 

B.  CONDITIONS  FOR  REDUCING  THE  NUMBER  OF  REQUIRED  LINEAR  ESTIMATORS 

Since  the  set  A  of  parameter  values  as  used  generally  in  Eq.  (3.2) 
and  (3.3)  may  be  a  large  finite  or  even  a  countable  or  uncount ably 
infinite  set,  the  calculation  of  the  set  of  all  conditional  estimates 
(o>  (a):  all  a  £  A)  may  not  be  feasible.  Since  the  elemental  processes 
have  been  assumed  to  be  gaussian  random  processes,  the  conditional  esti¬ 
mates  are  linear  estimates;  nevertheless,  the  amount  of  calculation 
required  may  be  prohibitively  large.  Consequently,  it  is  desirable  to 
investigate  assumptions  that  might  reduce  the  amount  of  required  data 
processing. 

Consider  a  subset  A*  of  the  parameter  space  A.  If  for  all  a 
in  A'  one  can  write 

25(a)  =  s(a)  •  h  zt,  (3.4) 

where  s(a)  is  a  matrix  function  of  a  and  where  H  is  a  matrix  operator 
(independent  of  a)  on  the  vector  Z t  corresponding  to  the  dynamical  part 
of  the  optimal  estimator,  then  the  calculation  of 

25  (a')  &  \  u>  (a)  p(a|  z  )  da 

J  A'  1 


may  be  simplified  as  follows: 


In  other  words,  the  nondynamical  portion  of  the  elemental  linear 
estimator — i.e.,  S(a)--is  included  in  the  calculation  of  the  weighting 
coefficients,  and  only  one  linear  dynamical  estimate— i.e. ,  H  — is 
required  for  the  subset  A1.  Thus  under  the  assumption  stated  in  Eq. 
(3.4)  the  amount  of  necessary  data  processing  has  been  reduced. 

Another  condition  that  greatly  simplifies  the  calculation  of  u>  (A1) 
is  given  below.  If  for  all  a  in  A1  and  all  one  can  write 

p(a|  \)  =  p(a),  (3.5) 

then  one  may  calculate  u>  (A1)  by  a  linear  operation  upon  the  observed 
data,  i.e., 


o>  (A1)  =  F  Zt> 

This  follows  by  the  gaussian  assumption  on  the  elemental  processes, 
which  implies  that  S>(a)  =  H(a)  Zt  where  H(a)  is  the  optimal  esti¬ 
mator  for  the  elemental  process  described  by  the  parameter  vector  a. 
Thus, 

a>  (a')  =  \  a(a)  p(a|  z  )  da  =  \  fi(a)  z  p(a|z  )  da. 

JA'  t  ja,  t  r 

Utilizing  the  assumption  stated  in  Eq.  (3.5)  one  finds  that 

a  (a')  =  \  H(a)  p(a)  da  •  z  =  f  z 

*'A' 

where  F  4  /A,  ®(ct)  p(oi)  da.  Furthermore,  the  above  linear  relation 
for  a  (A')  holds  in  general  only  under  the  assumption  given  by  Eq. 
(3.5).  Assume  p(a|Z^)  /  p(a) ;  then 

\  a  (a)  p(a|z  )  da  =  \  p(a|z  )  H(a)  da  •  z 

JA'  ZJA' 
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and 


a  (A')  =  G(Zt)  •  Zt 

where  G(Z  )  4  \  p(a| Z  )  H(a)  da. 

t  JA,  i 

Thus,  in  general,  2>(A')  is  a  nonlinear  function  of  Z since  the 
matrix  G  has  its  elements  determined  by  the  vector  Z^  on  which  it 
operates . 

One  cannot  expect  to  find  the  condition  stated  in  Eq.  (3.4) 
satisfied  frequently  in  practice.  With  respect  to  the  problems  posed 
in  this  dissertation,  it  will  be  found  that,  in  general,  the  conditional 
estimates  are  rather  complicated  functions  of  the  parameter  vector.  The 
desired  factorization  will  be  found  only  in  special  cases,  such  as 
filtering  a  white  message  process  of  unknown  power  from  a  white  noise 
process  of  known  power.  A  white  process  for  the  discrete-time  case 
is  defined  as  any  process  whose  time-covariance  matrix  is  the  identity 
matrix. 

The  second  condition  corresponds  to  those  cases  in  which  learning 
is  impossible  and,  consequently,  there  is  no  point  in  performing  the 
calculation  necessary  to  obtain  p(a|Zt).  This  is  the  case  that  was 
treated  by  Rauch  in  Ref.  3  and  represents  the  reason  he  was  able  to 
use  only  one  linear  filter  for  optimal  estimation. 

C.  APPLICATION  TO  A  CONTROL  PROBLEM 

An  Important  application  of  estimation  theory  is  to  statistical 
control  problems.  In  these  problems  the  state  of  nature  that  must  be 
estimated  is  the  state  vector,  i.e.,  u>  =  x(t),  of  the  system  equations. 

Consider  a  control  system  described  by  the  linear  difference 
equat ions 


x(t  +  l)  =  *(t)  x(t)  +  D(t)  u(t)  +  A(t)  v(t) 
y(t )  =  mT(t)  x(t) 


SEL-63-143 


-  16  - 


where  x(t)  is  the  state  vector  of  the  control  dynamics,  $(t)  is  the 
state-transition  matrix,  D(t)  is  the  control-distribution  matrix, 
u(t)  is  the  control  vector,  v(t)  is  a  zero-mean,  gaussian,  random¬ 
driving  force,  A(t)  is  the  random-driving-force  distribution  matrix, 
m(t)  is  the  output  vector,  and  y(t)  is  an  output  of  the  plant.  Further 
suppose  that  for  all  t  the  quantity  z(t)  actually  observed  is  y(t) 
corrupted  by  a  gaussian  random  variable,  t^(t),  which  is  statistically 
independent  of  v(t);  i.e.,  z(t)  =  y(t)  +  q  (t),  where  {^(t):  t  = 

1,  2,  ...],  (i  =  1,  2,  . . .  L)  is  a  gaussian  random  process  with  L 
different,  known,  possible,  statistical  characteristics. 

If  the  following  quadratic  performance  criterion  is  adopted, 

f  T, 

J  -■/  \  £xT(k  +  l)  Q  x(k  +  l)  +  uT(k)  'i  u(k 
^k=t 


where  Q  and  'i  are  symmetric,  positive  definite  matrices,  then  it  is 
well  known  that  the  optimal  control  is  a  linear  function  of  the  optimal 
estimate  of  the  state  vector.  Thus, 

u(t)  =  C(t )  E(x(t)|zt)  =  c(t )  x(tjt). 


Consequently,  the  result  on  adaptive  estimation  derived  earlier  in 
this  chapter  is  applicable  to  linear-dynamic,  quadratic-cost,  control 
problems  when  the  observed  output  is  corrupted  by  a  gaussian  process 
described  by  an  unknown  parameter  vector  a.  Naturally  in  the  control 
problem,  also,  it  is  necessary  to  assume  that  a  must  come  from  a 
finite  set  of  known  possible  values  if  implementation  of  the  control  law 
is  to  be  feasible. 

At  this  point  the  form  of  the  optimal  estimator  has  been  found,  and 
one  of  its  important  applications  has  been  stated.  All  that  remains  to 
be  done  is  to  calculate  the  conditional  estimates  and  the  weighting 
coefficients;  this  will  be  done  in  the  next  two  chapters,  respectively. 
Actual  evaluation  of  the  weighting  coefficients  will  involve  some  non¬ 
linear  calculations  in  terras  of  the  observed  data.  Fortunately,  much 
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of  the  necessary  calculation  is  linear  and  is  provided  by  the  con¬ 
ditional  estimators.  Because  of  this  labor-saving  relation  the  con¬ 
ditional  estimators  will  be  derived  first.  None  of  the  results  in 
Chapter  IV  are  new,  but  the  calculation  of  the  weighting  coefficients  is 
so  closely  tied  in  with  these  results  that  it  is  helpful  to  derive  them 
here. 
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IV.  ELEMENTAL  ESTIMATORS 


This  chapter  by  itself  represents  a  brief  treatment  of  the  estimation 
of  discrete-time,  scalar-valued,  gauss-markov  processes  whose  statistics 
are  completely  known.  The  results  are  not  new  and  various  portions  have 
been  presented  in  Refs.  2,  7  and  10.  However,  this  chapter  does  repre¬ 
sent  the  first  time  that  these  results  have  been  derived  in  this  manner. 

It  is  believed  that  this  derivation  is  developed  from  a  unified  approach 
that  clarifies  the  fundamentals.  The  concept  of  the  displaced-covariance 
equation,  which  is  introduced  here  for  the  first  time,  more  closely  re¬ 
lates  the  solutions  of  interpolation  problems  to  those  of  filtering  and 
prediction  problems. 

Additionally,  the  projection  theorem  of  Hilbert  space  theory  is 
used  as  a  partial  basis  for  the  derivation  of  the  form  of  the  optimal 
estimator.  This  theorem  is  very  powerful  and  simplifies  the  derivation. 
An  appendix  is  devoted  to  elementary  aspects  of  Hilbert  space  theory 
since  this  approach  is  not  common  in  the  engineering  literature. 

It  should  be  noted  that,  while  the  results  obtained  are  only  for 
scalar-valued  observable  processes — i.e.,  one  observable  quantity  at 
a  time— they  could  be  extended  to  vector  processes,  as  has  been  done  in 
Refs.  2,  7  and  10.  This  extension  is  not  made  here  for  three  reasons. 
First,  scalar-observable  processes  are  more  common  in  practice.  Second, 
notational  difficulties  would  tend  to  obscure  the  results  when  evaluating 
the  weighting  coefficients  in  Chapter  V.  Finally,  if  multiple  observa¬ 
tions  were  permitted,  matrix  inversions  of  the  dimension  of  the  multi¬ 
plicity  would  be  needed  for  each  step.  This  procedure  represents  a 
considerable  amount  of  calculation  and  may  well  not  be  worth  the  effort 
compared  to  the  following  subopt imal  procedure. 

Imagine  that  k  observable  quantities  are  present  at  one  time.  One 
procedure  would  be  to  look  at  these  quantities  sequentially  and  regard 
this  sequence  as  a  scalar  process  with  a  new  structure  at  k  times  the 
original  aampling  rate.  If  any  of  the  observable  quantities  were  in¬ 
dependent  of  the  others,  they  would  have  to  be  treated  as  separate 
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problems.  This  procedure  Is  suboptimal  in  that  most  of  the  quantities 
would  not  be  used  immediately.  The  loss  in  performance  is  not  likely  to 
be  great  since  all  the  data  would  be  processed  before  the  next  group 
of  k  samples  arrived.  This  type  of  sequential  approach  is  mentioned 
by  Ho  [Ref.  10],  who  suggests  that  a  matrix-inversion  lemma  can  avoid 
matrix  inversions  altogether  by  processing  one  piece  of  data  at  a  time. 

It  should  be  noted  that  the  scalar-valued  restriction  applies  only 
to  the  observable  process  [z(t):  t  =  1,  2,  ...}.  The  quantity  being 
estimated,  cu,  may  well  be  vector-valued. 

A.  MODEL  OF  AN  ELEMENTAL  PROCESS 

Since  this  chapter  deals  with  the  estimation  of  a  single  elemental 
process  {z^t)  =  y^(t)  +  n^t):  t  =  1,  2,  ...},  there  is  no  need  for 
the  subscript  or  the  word  elemental;  consequently  they  will  be  omitted 
in  most  cases  for  notational  convenience.  It  is  to  be  understood  that 
the  following  analysis  applies  to  each  and  every  elemental  process. 

In  this  chapter  it  will  be  useful  to  make  a  further  restriction  on 
the  elemental  processes.  They  will  be  assumed  to  be  gauss-markov 
processes;  that  is,  they  are  gaussian  random  processes  that  posses  a 
generalized  Markov  property  (explained  below  in  Ref.  7,  page  17).  This 
assumption  has  also  been  made  in  Refs.  2,  3,  and  7  since  it  enables  the 
sufficient  statistic  [Ref.  11]  to  remain  of  finite  and  fixed  dimension, 
thereby  vastly  simplifying  the  data  processing  and  storage  required. 

For  stationary  processes,  this  assumption  in  terms  of  the  Z-transform 
theory  of  sampled-data  systems  means  that  the  power  spectral  density  can 
be  expressed  exactly  or  adequately  approximated  by  a  ratio  of  poly¬ 
nomials  in  za.  Hence,  this  assumption  is  not  unduly  restrictive. 

It  should  be  noted  that  the  terminology  gauss-markov  process  as 
used  by  Kalman  [Ref.  7]  is  somewhat  misleading  since  in  general  neither 
the  observable  process  (s(t):  t  =  1,  2,  . ..)  nor  its  signal  and 
noise  components  possess  the  strict  Markov  property.  Rather,  they  are 
derived  by  a  linear  operation  on  an  implicit  state  vector  x(t),  which 
does  possess  the  true  Markov  property,  namely, 

p[x(t  +  1 ) | x ( t ) ,  x(t  -  1),  ...]  =  p[x(t  +  1 ) | x(t ) ) . 
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In  order  to  clarify  this  point  the  terminology  implicit -markov,  gaussian 
process  will  be  adopted  in  this  work. 

It  is  assumed  that  the  processes  are  generated  by  linear  difference 
equations  as  described  below.  These  equations  are  known  since  either 
they  are  the  known  physical  structure  of  the  processes  or  they  have 
been  synthesized  to  generate  the  statistical  properties  of  the  processes. 

s ( t  +  l)  =  ®  (t)  s(t)  +  D  (t)  u  (t) 

8  8  8  (4.1) 

y(t )  =  rT(t)  •  s (t ) 
w(t  +  l)  =  ®  (t)  w(t)  +  D  (t)  u  (t) 

(4.2) 

n(t)  =  hT(t )  •  w(t) 

where  s(t)  is  the  state  vector  of  the  message  process 

is  the  state  vector  of  the  noise  process 

is  the  state  transition  matrix  of  the  message 
process 

is  the  state  transition  matrix  of  the  noise 
process 

is  the  distribution  matrix  of  the  message 
process 

is  the  distribution  matrix  of  the  noise  process 

is  a  white  gaussian  vector  process  representing 
the  driving  force  of  the  message  process 

is  a  white  gaussian  vector  process  representing 
the  driving  force  of  the  noise  process 

is  the  output  vector  of  the  message  process 

is  the  output  vector  of  the  noise  process 


w(t) 

*.(0 

♦.<*> 

D„<t) 

o.(t) 

«,(<> 

U  (t) 
w  ' 

r(t ) 
h(t) 
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It  is  possible  to  combine  Eqs.  (4.1)  and  (4.2)  by  a  process  of 
augmenting  the  state  vector.  One  defines  the  following  quantities: 


and 

mT(t)  A  [ rT ( t )  |  hT ( t ) ] . 

I 

The  large  rectangular  zeros  that  appear  in  the  matrices  $(t)  and 
A(t)  represent  areas  in  which  all  the  elements  of  these  matrices  are 
zero. 

Further,  define 


E{v(t)  vT(t)}  A  Q(t), 
u(t)  A  Q~^(t )  v(t), 


and 


D(t)  A  A(t)  Q*(t). 


If  Q  1(t)  does  not  exist,  the  dimensionality  of  the  problem  may  be 
reduced.  The  model  for  the  complete  process  may  now  be  represented  by 
the  linear  difference  equation, 


x(t  +  l)  =  $(t)  x(t)  +  D(t)  u(t)  t  =  -oo . -1,  0,  1,  ...,  oo 

z(t)  =  mT(t)  x(t)  t  =  1,  2,  ...  (4.3) 
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The  driving  force  u(t)  which  is  a  gaussian  random  noise  process, 
both  spatially  and  temporally  white,  has  covariance  matrix 

E{u(j)  uT(k)}  =  I  •  5  for  all  integers  j,  k, 

JK 

where  6  is  the  Kronecker  delta  function, 
jk 

Equation  (4.3)  is  an  adequate  model  to  represent  any  zero  mean, 
implicit -markov,  sampled-data,  gaussian,  random  process.  Proper  struc¬ 
ture  of  the  input  distribution  matrix  D(t)  permits  any  desired  degree 
of  correlation  between  message  and  noise  processes.  By  representing 
the  process  in  terms  of  its  difference  equation,  nonstationary  or 
finite-duration  processes  may  be  handled  with  the  same  theoretical 
procedure  as  infinite-duration  stationary  processes.  In  the  latter 
case  the  quantities  $(t),  D(t),  and  m(t)  merely  become  constants 

independent  of  time.  It  should  be  noted  that  the  ranges  of  the  time- 
index  sets  of  Eq.  (4.3)  differ.  This  difference  is  intended  to  reflect 
the  fact  that  only  a  finite  number  of  observations  is  available  at 
present  but  that  the  internal  or  implicit  structure  of  the  process  may 
have  existed  for  an  infinitely  long  period.  Thus,  Eq.  (4.3)  may  repre¬ 
sent  a  stationary  process  upon  which  observations  are  taken  after  time 
t  «=  0.  In  the  event  that  it  is  desired  to  represent  a  process  whose 
internal  structure  begins  at  time  t  =  0,  it  suffices  to  make  D(t) 
identically  the  zero  matrix  for  t  <  -1. 

Figure  6  is  a  block-diagram  representation  of  Eq.  (4.3).  Wide 
arrows  are  used  to  represent  the  signal  flow  of  vector  quantities  while 
the  conventional  line  arrows  represent  scalar  quantities.  The  various 
blocks  perform  the  linear-matrix  operations  inscribed  in  them  on  the 
incoming  vector  quantities.  The  summer  is  intended  to  represent  a 
vector  summation. 

B.  RECURSIVE  FORM  OF  OPTIMAL  ESTIMATE 

In  this  section  the  form  of  the  optimal  estimator  for  an  eleswntal 
process  is  found. As  a  first  step  the  projection  theorem  is  introduced. 
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FIG.  6.  MODEL  OF  ELEMENTAL  SAMPLED  STOCHASTIC  PROCESS. 


This  theorem  (which  states  a  necessary  and  sufficient  condition  for  an 
optimal  linear  estimate)  is  applicable  to  the  optimal  estimate  u>  (the 
conditional  mean  of  oj)  since  gaussian  statistics  have  been  assumed  for 
the  elemental  process.  For  the  second  step  the  projection  theorem  is 
used  in  conjunction  with  properties  of  the  conditional  mean  to  derive 
the  recursive  form  of  the  optimal  estimate.  The  third  step  consists 
of  finding  a  general  expression  for  the  gain  vectors  that  appear  in  the 
recursive  form.  The  final  steps  consist  of  finding  specific  expressions 
for  the  gain  vectors  in  four  major  forms  of  the  estimation  problem. 

The  common  filtering  problem  will  be  solved  first  because  of  its 
importance  and  since  its  solution  is  fundamental  to  the  other  forms. 
Next,  the  prediction  problem  will  be  solved  since  its  solution  involves 
only  a  simple  extension  of  the  analysis  used  for  the  filtering  problem. 
Finally,  because  of  its  difficulty,  the  interpolation  problem  will  be 
considered  as  two  problems.  The  first  of  these  is  that  of  fixed- 
relat lve-t ime  interpolation;  that  is,  one  is  interested  in  estimating 
at  each  instant  the  value  that  a  quantity  had  |  y\  samples  ago.  The 
other  problem  is  that  of  fixed-absolute-time  Interpolation;  an  example 
of  this  is  the  estimation  of  the  initial  condition  of  the  state  vector. 

As  a  result  of  the  assumed  structure  of  the  message  and  noise 
processes,  a  very  useful  property  of  optimal  estimates  results.  The 
implicit -markov  assumption  allows  many  related  estimation  problems  to 
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be  handled  simultaneously  with  little  more  labor  than  necessary  for  one 
estimation  problem.  More  specifically,  the  optimal  estimates  &(j|t) 
of  all  quantities  <n(j)  that  are  linear  functions  of  the  state  vector 
x(j)  (i.e.,  cu(j)  =  A  x(j)  for  some  matrix  A)  may  be  simply  found  by 
performing  that  linear  operation  on  the  optimal  estimate  of  the  state 
vector.  This  fact  follows  because  the  optimal  estimate  is  the  con¬ 
ditional  mean.  When  the  conditional -expect at  ion  operator  is 

applied  to  both  sides  of  the  linear  relation,  the  following  equality 
results 


&(j|t)  £  E{co(  J )  |  Zt }  =  A  x(j|t). 


Therefore,  the  subsequent  sections  will  be  primarily  devoted  to  the 
estimation  of  the  state  vector,  i.e.,  o>  =  x,  even  though  in  many  cases 
it  is  a  hypothetical  quantity. 

1 .  Development  of  Projection  Theorem 

The  following  theorem  from  Hilbert  space  theory  [Ref.  12]  is 
applicable  since  the  expectation  operator  E[*)  operating  on  two 
random-variable  vectors  o>  and  v  satisfies  the  properties  of  an 
inner-product  relation.  That  is,  one  may  write 

T 

E{o>  •  v)  *  (u>,  v) 

where  cu  and  v  are  regarded  as  vectors  in  a  Hilbert  space  and 
(*,*)  denotes  the  inner-product  operator.  Furthermore,  the  quantity 
(u>,  (u),  which  is  sometimes  denoted  ||oo|| 3  since  it  is  a  measure  of 
the  square  of  distance  in  the  Hilbert  space,  is  Just  the  sum  of  the 
mean-square  values  of  the  random-variable  components  of  oo.  Hence, 
the  quantity  (o>  -  ffi,  tu  -  cd)  =  ||o>  -  u>||3  is  just  the  sum  of  the  mean- 
square  errors  of  each  component  of  the  optimal  estimate.  Let  r(t) 
be  defined  as  the  linear  space  created  by  the  sequence  of  observables 
z(l),  z(2),  ...  z(t).  In  other  words,  r(t)  consists  of  all  quantities 
that  may  be  written  as  H  for  some  matrix  H  (possibly  a  row  vector). 
By  the  assumptions  of  the  problem,  u>  £  P(t). 
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PROJECTION  THEOREM:  (special  case  of  the  general  abstract  theorem 
stated  and  proved  in  Appendix  A). 

E(o)  -  v)T  (a)  -  v)]  >  E{ (to  -  a>)T  (a>  -  3>)) 
for  all  v  6  r(t)  if  and  only  if 

E{(ca  -  o>)T  •  v)  =  0 

for  all  v  e  r(t). 

Any  random  variables  u  and  v  that  satisfy 

E(uT  •  v)  =  0 

are  said  to  be  orthogonal,  denoted  u  1  v.  The  error  term  (cu  -  So)  &  2b 
is  called  the  residual . 

Briefly,  the  projection  theorem  states  that  the  residual  is  ortho¬ 
gonal  to  the  space  of  linear  estimates.  Thus,  the  optimal  estimate 
0),  which  is  a  linear  estimate,  may  be  geometrically  interpreted  as  the 
perpendicular  projection  of  w  on  the  linear  space  r(t). 

The  orthogonality  property  of  the  residual  will  prove  useful 
throughout  the  remaining  chapters  and  will  be  crucial  in  recognizing  a 
simple  method  for  determining  the  weighting  coefficients. 

2.  Derivation  of  Recursive  Form  of  Optimal  Estimate 

In  this  section  the  recursive  form  of  the  optimal  estimate  will 
be  derived.  Except  for  gain  constants,  the  solution  of  the  estimation 
problem  will  be  found.  Later  sections  will  be  devoted  to  evaluating 
these  gain  constants  and  to  further  manipulations  that  will  yield  forms 
of  greater  intuitive  value  or  greater  computational  utility. 

Consider  the  following  definitions  for  all  integer  values  of  i 

and  J. 


x(j|i)  4  E{x(j)|Zi). 
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x(j| i)  4  x(j)  -  x(j| i). 


In  words,  x(j|i)  represents  the  best  estimate  of  the  state  vector 
x(j),  at  time  j  given  all  the  available  data  up  to  time  i. 

The  quantity  x(j| i)  is  just  the  error  of  the  best  estimate  $(j|i). 

Thus ,  one  has 

x(t  +  y)  =  2(t  +  y\t  -  l)  +  x(t  +  y\  t  -  l),  (4.4) 

where  y  is  some  positive  (prediction)  or  negative  (interpolation) 
integer  and  t  represents  the  integer  corresponding  to  the  present 
sampling  instant. 

Since  the  optimal  estimate  is  Just  the  conditional  mean,  one 
may  apply  the  conditional  mean  operator  E{ * | )  to  Eq.  (4.4)  to  obtain 

x(t  +  y|t)  *  *(t  +  y\ t  -l)  +  E{x(t  +  y|t  -  1 ) | Zt ) .  (4.5) 

Taking  the  conditional  expectation  E{*|Zt_1)  on  both  sides  of  Eq.  (4.4) 
and  utilizing  the  projection  theorem,  one  finds  that 

t-1 

I(X(t  ♦  7|t  -  l)|*t-1)  k(t  +  7,i)  »(i|i  -  1)  ■  0  .  (4.6) 

i=l 

The  series  expansion  of  the  projection  is  valid  since  the  time  series 

{z(i|i  -  1)  4  «(i)  -  2(i|i  -  1):  i  =  1,  2 . t  -  1) 

spans  T(t  -  l).  Likewise,  since  r(t  -  l)cr(t), 

t  ! 

E{x(t  +  y|t  -  1 )  |  Zt )  k(t  +  7,  i)  z(i|  i  -  l)  =  k(t  +  7,t)  z(t|t  -  l). 

i=l 

\ 
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Since  Eq.  (4.6)  must  hold  for  all  zt-1>  one  finds  that  k(t  +  7,1)  =  0 
for  i  <  t.  Therefore,  one  may  now  write  the  fundamental  recursive 
relation  of  the  optimal  estimate. 

x(t  +  7|t)  =  x(t  +  7jt  -l)  +  k(t  +  7>t)  z(t|t  -  l) 

(4.7) 

for  t  =  1,  2,  ... 

Since  the  process  is  assumed  to  be  zero  mean,  the  initial  value  of  Eq. 
(4.7)  is,  for  all  integers  7, 

x(7|0)  =  0. 

Intuitively,  the  vector  k(t  +  7,t)  represents  a  gain  vector 
that  operates  on  the  error  signal  z(t|t  -  l)  to  provide  a  correction 
vector  for  the  previous  best  estimate  of  the  state  fc(t  +  7|  t  -  l). 

The  next  section  will  express  the  gain  vector  in  terms  of  the  various 
parameters  of  the  process. 

3.  Determination  of  the  Gain  Vector 

By  applying  the  reasoning  used  in  the  introductory  portion  of 
Chapter  III,  it  is  apparent  that  the  optimal  estimate,  which  is  the 
conditional  mean,  may  be  found  by  minimizing  the  trace  of  the  following 
covariance  matrix, 

P(t  +  7|t)  &  E{x(t  +  7|t)  xT(t  +  7[t)}.  (4.8) 

Likewise,  define  in  general  for  all  integers  i  and  j  the 
covariance  matrix 

P(j|i)  £  E{x(j|i)  •  xT(j|i)}  (4.9) 

and  the  displaced  covariance  matrix 

R(j,k|i)  £  E{x(j|i)  •  xT(k|i)}.  (4.10) 
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Utilizing  these  definitions,  Eq.  (4.7),  and  the  fact  that  z(t|t  -  l)  = 
mT(t)  x(t|t  -  l),  yields 

P(t  +  7|t)  =  P(t  4  7|t  -  l)  -  R(t  4  7, t | t  -  l)  m(t )  kT(t  4  7,t) 

-k(t  4  7,t)  mT(t)  RT(t  4  7 , t | t  -  l) 

4k(t  4  7, t )  mT(t)  P(t|t  -  l)  m(t )  kT(t  4  7,t) 

(4.11) 

Completing  the  square,  applying  the  trace  operator  (denoted  by  tr{«) 
to  both  sides  of  Eq.  (4.1l),  and  using  the  trace  identity  yields 


(4.12) 


where 

CT2(t|t  -  l)  4  mT(t)  P(t| t-l)m(t)  =  Var(z(t|t  -  l)) 

and  Var{*}  denotes  the  variance  of  the  specified  scalar-valued  random 
variable. 

The  gain  vector  enters  into  only  the  first  term  of  Eq.  (4.12). 
Since  that  term  is  a  positive  definite  form,  the  optimum  gain  vector  is 

k(t  4  7,t)  = 
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Equation  (4.13)  is  the  general  expression  for  the  gain  vector.  Dif¬ 
ferences  between  the  problems  of  prediction,  filtering,  and  interpolation 
enter  only  through  the  displaced-covariance  matrix  R(t  +  7 , t | t  -  l). 
Consequently,  the  subsequent  sections  devoted  to  these  problems  will  be 
composed  primarily  of  iterative  solutions  of  the  displaced-covariance 
equation  in  the  various  cases. 

4 .  Solution  of  the  Filtering  Problem 

In  this  case  y  =  0  and  the  displaced-covariance  matrix 
R(t  +  7 , t | t  -  l)  is  simply  the  nondisplaced-covariance  matrix 
P(t|t  -  l).  Thus,  the  basic  relations  in  this  case  are 

x(t|t)  =  x(t|t  -  l)  +  k(t,t)  z(t|t  -  l)  (4.14) 


and 


*•«>  -  ^-0-:-^  • 


(4.15) 


Moreover , 


x( 1 1 1  -  1)  *  »(t  -  1)  x(t  -  l|t  -  1) 

since  E{u(t  -  1 ) | )  =  0  because  of  the  time  independence  of  the 
random  driving  force.  Hence, 


x(tjt)  =  $(t  -  l)  x(t  -  l|t  -  l)  +  k(t,t)  z(t|t  -  l).  (4. 16) 

A  block  diagram  of  the  optimal  filter  is  depicted  in  Fig.  7. 

It  is  of  considerable  Interest  to  compare  this  diagram  with  Fig.  6, 
which  represents  the  model  of  the  random  process,  since  the  optimal 
filter  contains  a  model  of  the  internal  structure  of  the  process. 

The  next  step  is  the  iterative  evaluation  of  the  covariance 
equation.  As  part  of  the  problem  statement  it  is  assumed  that  the 
a  priori  covariance  matrix  P(l|0)  is  known.  Therefore,  in  order  to 
evaluate  P(t|t  -  l)  for  all  time  it  will  be  sufficient  to  relate 
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FIG.  7.  MODEL  OF  OPTIMAL  FILTER. 

iteratively  P(t  +  l|t)  and  P(t|t  -  l).  Use  of  appropriate  definitions 
and  Eq,  (4.3)  gives 

x(t  +  1 1 1 )  =  6(t)  x(t|t  -  l)  +  D(t)  u(t )  -  ®(t)  k(t,t)  z(t 1 1  -  l). 

Substitution  of  this  expression  in  the  definition  of  P(t  +  1 1 1 )  yields 

P(t  +  l|t)  =  *(t)  [I  -  k(t,t)  mT(t )]  P(t|t  -  1)  [I  -  k(t,t)  mT(t)]T 

®T(t)  +  D(t)  DT(t).  (4.17) 

Further  reduction  is  possible  by  substitution  of  Eq.  (4.15)  into  (4.17) 
to  give  the  covariance  equation 

P(t  +  l|t)  =  *(t)[l  -  k(t,t)  *T(t)]  P(t|t  -  1)  *T(t)  +  D(t)  DT(t)  . 

(4.18) 

Consequently,  by  sequential  cyclic  use  of  Eqs.  (4.15)  and  (4,18),  the 
gain  vector  can  be  determined  for  all  time.  The  optimal  filter  for 
the  state  vector  is  now  complete.  The  best  estimate  of  any  linear 
function  of  the  present -state  vector  is  then  found  by  applying  that 
linear  function  to  S(t|t).  Thus ,  for  example,  in  the  conventional 
filtering  problem  the  best  estimate  of  the  message  is 

$(t|t)  =  frT(t)jc - a  a(t|t), 

where  the  oblong  zero  denotes  the  portion  of  the  vector  that  has  all 
sero  elements. 
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The  covariance  matrix  of  the  error  of  the  estimate  $(t|t)  may 
be  found  by  expanding  the  definition  of  x(t|t).  Thus, 

P(t|t)  =  [I  -  k(t ,t )  mT(t)]  P(t|t  -  l).  (4.19) 

Therefore,  the  error  power  of  the  filtered  message  is 

Var{y(t|  t))  =  [rT(t)jCZ)]  P(t|t) 

Equations  (4.18)  and  (4.19)  may  be  combined  to  give 

P(t  +  l|t)  =  «(t)  P(t|t)  ®T(t)  +  D(t)  DT(t), 

which  will  be  useful  if  it  is  necessary  to  evaluate  the  filtering  per¬ 
formance  at  each  step. 

It  is  interesting  to  note  that  the  gain  vector  k(t,t),  distri¬ 
butes  the  error  signal  z(t|t  -  l)  to  the  estimate  of  the  state  x(t|t) 
in  3uch  a  manner  that  the  output  vector  m(t)  operating  on  the  state 
estimate  yields  the  present  value,  z(t),  of  the  observed  process. 

Thus, 


mT(t)  x(t | t )  =  z(t|t)  =  z(t). 

This  equality,  which  obviously  must  hold  if  the  estimator  is  to  be 
optimal,  may  be  established  by  showing  that  the  variance  of  z(t|t)  is 
zero.  Using  appropriate  definitions  and  Eqs.  (4.19)  and  (4.15),  one 
finds  that 

Var{z(t|t)}  =  mT(t)  P(t|t)  m(t)  =  0. 

Since  for  any  reasonable  process  the  output  vector  m(t)  is  not 
identically  the  zero  vector,  it  has  been  established  that  the  matrix 
P(t|t)  is  nonnegative  definite  and  not  positive  definite. 
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5.  Solution  of  the  Prediction  Problem 


For  this  problem,  y  >  Oj  thus,  the  displaced  covariance  matrix 
is  simply  related  to  the  covariance  matrix  P(t|t  -  l)  and  may  be 

evaluated  as  outlined  below.  Using  Eq.  (4.3)  and  pertinent  definitions, 

one  may  write 

x(t  +  y\  t  -  l)  =  ®(t  +  y  -  l)  x(t  +  y  -  1 | t  -  l) 

+  D(t  +  y  -  l)  u(t  +  y  -  l)  y  >  0. 


Substitution  of  this  expression  in  the  following  definition  yields 


R(t  +  y,t| t  -  l)  A  E{x(t  +  y\t  -  l)  •  xT(t|t  -  l)) 


®(t  +  y  -  l)  R(t  +  y  -  1 , t | t  -  l)  y  >  0, 

(4.20) 


where  use  has  been  made  of  the  time  independence  of  the  random  driving 
force  u(t).  Repetitive  application  of  Eq.  (4.20)  implies  that 


R(t  +  7,t|t  -  l) 


n  •<*> 

i=t+7~l 


P(t|t  -  l)  7  >  0. 


(4.21) 


Thus,  by  Eq.  (4.13), 


h(t  +  7,t)  = 


n  *(0 

i*t+7-l 


h(t,t)  7  >  0. 


(4.22) 


Now  observe  that,  by  repeated  application  of  the  fundamental  recursive 
relation  (4.7), 

t 

*(t  +  7|0  k(t  +  7.0  *(i|  i  "  l) 

i«l  (4.23) 

for  any  Integer  y. 
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Combining  Eqs.  (4.22)  and  (4.23)  yields 
t 

*(t  +  7|t)  =  n  *(o  ^(tio 

Li=t+y-l  J 


for  y  >  0. 


(4.24) 


Equation  (4.24)  represents  the  solution  to  the  prediction  problem.  It 
is  found  by  merely  performing  a  matrix  multiplication  upon  the  filtering 
solution  and,  consequently,  the  block  diagram  of  the  optimal  predictor 
is  such  a  minor  extension  of  Fig.  7  that  it  will  not  be  illustrated. 

6.  Solution  of  the  Fixed-Relative-Time  Interpolation  Problem 

The  fixed-relative-time  interpolation  problem  is  concerned 
with  estimating  the  state  vector  x(t  +  y)  for  all  integer  values  of 
t  and  some  specific  negative  integer  y.  Thus,  for  example,  at  each 
sampling  instant  it  might  be  of  interest  to  estimate  the  state  vector 
five  samples  ago. 

Evaluation  of  the  displaced-covariance  matrix  is  most  difficult 
for  the  interpolation  problem.  This  factor  undoubtedly  explains  the 
avoidance  of  this  problem  in  the  earliest  works  employing  the  state- 
space  approach. 

Use  of  Eq.  (4.7)  and  appropriate  definitions  yields  the  following 
equat ions : 


x(t  +7|i  -  l)  =x(t  +  y|i  -2)  -k(t  +  7,  i  -  l)z(i  -  1 | i  -  2)  (4.25) 

x(i|i  -  1)  =  »(i  -  1)  x(i  -l|i  -  2)  +  D(i  -  1)  u(i  -  l) 

-  «(i  -  1)  k(i  -  l,i  -  1) z(i  -  1|  i  -  2)  .  (4.26) 


Substitution  of  these  equations  in  the  definition  of  the  displaced- 
covariance  matrix  gives 

R(t  +  7 , i |  i  -  l)  =  R(t  +  7,1  -  l|i  -  2) [ I  -  m(i  -  l)  kT(i  -  l,i  -  l)] 

*T(i  -  1)  for  i  >  t  +  7.  (4-27) 
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Repetitive  application  of  Eq.  (4.27)  impliea  that 


1-1 


R(t  +  7, ij 1  -  1)  =  P(t  +  y\t  +  7  -  1)/  J  [I  -  m(j)  kT(j,j)]  *T(j) 

J=t+7 

(4.28) 


Thus  having  iteratively  found  P(t|t  -  l)  by  Eqa.(4.15)  and  (4.18),  one 
may  find  R(t  +  y.iji  -  l)  for  all  i  by  use  of  Eq.  (4.28), 

To  solve  the  moving  (or  fixed-relative-time)  interpolation 
problem,  relate  x(t  +  y|  t )  and  $(t  +  7  -  1|  t  -  l). 


8(t  + 


y|t)  =  k(t 
i=l 


+  7, i)  z(i| i  -  l) 


t-i 

i 


k(t  +  7, i)  z(i| i  -  l)  +  k(t 


+  7,t)  z(t|t 


1) 


(4.29) 


t-1 


x(t  +  7  -  l|t  -  1)  =£  k(t  +  7  -  1,1)  l(i|i  -  1) 
1*1 


(4.30) 


where 


k(t  +  7,  i)  = 


and 


k(t  +  7  -  l,i) 


R£t_ 


W0 


IT 
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Thus,  to  find  a  relation  between  these  two  displaced-covariance  matrices, 
combine  Eqs.  (4.18),  (4.20),  and  (4.28)  to  obtain 

R(t  +  7. i|i  -  1)  -  *(t  +  7  -  1)  *(t  +  7  -  1 , 1 | 1  -  1) 

+  D(t  +  7  -  1)  DT(t  +7-1) 

-  1 

R(t  +  7, i 1 i  -  l)  =  ®(t  +7-1)  R(t  +  7  -  1 , i  )  1  -  l)  for  i  s  t  +  7  -  1. 

(4.32) 


{If  i. 

^ J*t+7 


®(J)  hT(j,j)]  »T(j) 


for  i  >  t  +  7 
(4.31) 


Thus 


k(t  +  7, i)  =  «(t  +  7  -  l)k(t  +  7  -  l,i) 


+  D(t  +  7  -  l)DT(t  +7-1) 


mO)*  (j, j)]®  (j) 


■) 


n(l) 

ffa(i  |i  -  1) 


(4.33) 


for  i  >  t  +  7  -  1 


k(t  +  7, i)  =  «(t  +  7  -  l)k(t  +  7  -  1, i) 


(4.34) 

for  i  s  t  +  7  -  1. 


Rewriting  Eq.  (4.29)  and  utilizing  (4,30),  (4.33),  and  (4.34)  one  finds 
that 


t+7-l 

S(t  +  7|t)  =  ^  k(t  +  7,i)z(l| i 


t-1 


-‘>*1 

i*t  +7 


k(t  +  7,i)z(i|i 


-  1) 


+  k(t  +  7,t)z(t|t  -  1) 
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x(t+y| t)  =  4(t+y-l)  S4(t+y-l| t-l)  +  k(t+7,t)  *(t| t-l)  +  D(t+y-l) 


Equation  (4.35)  represents  the  fundamental  equation  relating  the 
successive  estimates  in  the  fixed-relative-time  interpolation  problem. 
Figure  8  illustrates  a  block  diagram  of  the  optimal  fixed-relative-time 
interpolator.  It  should  be  noted  that  an  optimal  filter  is  required  as 
part  of  the  interpolator.  Also  note  that  a  tapped  delay  line  of  length 
| 7j ,  which  has  | y\  different  time-variant  gain  vectors  for  tap  gain 
coefficients,  is  required.  Consequently,  for  large  values  of  | y\  , 
implementation  of  Eq.  (4.35)  will  require  a  large  amount  of  equipment 
and/or  computation. 

Under  proper  circumstances  Eq.  (4.35)  may  be  rewritten  and  a  simpli¬ 
fied  block  diagram  may  be  found.  If  it  is  assumed  that  the  system  dif¬ 
ference  equations  are  a  discrete-time  representation  of  a  continuous -time 
system  described  by  linear  differential  equations,  then  the  inverse  of 
the  state-transition  matrix  will  exist.  Furthermore,  if  it  is  also 
assumed  that  the  observed  process  is  really  z'(t)  =  z(t)  +  v(t)  where 
v(t )  is  a  white,  sero-mean,  gauss ian  process  of  finite  power,  then  the 
matrix  P(t|t)  will  have  an  inverse.  When  these  two  matrices  possess 
inverses,  the  following  equality  can  be  found  using  Bq.  (4.19)  and  (4.28), 


[l-m(j)  kT(j,j)]*T(j)) 


(t+7-l)  P-1(t+7-l|  t+7-l)  R(t+7-l,i| i-l). 


(4.36) 
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FIG.  8.  MODEL  OF  OPTIMAL  RELATIVE-TIME  II 


Equation  (4.36)  may  be  aubatituted  into  Eq.  (4.35)  to  yield 


t(t  +  y|t)  =  ®(t  +  7  -  l)S(t  +  7  -  l|t  -  l)  +  k(t  +  y,t)z(t|t  -  l) 
+  D(t  +  y  -  l)DT(t  +  y  -  l)®T  (t  +  y  -  l) 


r  t-i 


•  P  1(t  +  y  -  l|t  +  y  -  l) 


£  R(t  +  7  -  l,i|i  -  1) 


i=t+7 


(7a(i|  i  -  1)  Z^l  1  "  ^ 


c(t  +  7|t)  =  ®(t  +  7  -  l)x(t  +  7  -  1|  t  -  l)  +  k(t  +  7,t)z(t|t  -  l) 
♦  D(t  +  7  -  l)DT(t  +  7  -  1)®T  (t  +  ®  -  1) 


•  P"1(t  ♦  7  -  l|t  +  7  -  l)[S(t  +  7  -  l|t  -  1) 


-  Si(t  +  7  -  l|t  +  7  -  1)]  .  (4.37) 

Equation  (4.37)  represents  an  alternate  (and  simpler)  method  of 

T_1 

writing  Eq.  (4.35)  when  the  quantities  ®  (t  +  7  -  l)  and 

P  *(t  +  7  -  l|t  +7-1)  exist.  This  form  for  the  fixed-relatlve-tlme 
interpolator  was  found  previously  by  Rauch  [Ref.  13]  using  the  same 
assumptions  but  different  techniques. 

7.  Solution  of  the  Fixed-Abaolute-Tlme  Interpolation  Problem 

In  this  case  it  is  desired  to  estimate  at  each  sampling  instant 
the  state  vector  x(j),  at  some  fixed,  absolute  time  j.  Perhaps  the 
most  comaxm  example  of  this  is  the  estimation  of  the  initial  value  of 
the  state  vector,  l.e., 

*(0). 

Repetitive  application  of  Eq.  (4.7)  implies  that  for  any  Integer 
t  corresponding  to  the  present  time 
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(4.38) 


1). 


The  gain  vector  may  be  found  from  the  displaced-covariance  matrix  that 
was  determined  in  the  previous  section.  Therefore,  Eq.  (4.38)  represents 
the  solution  of  the  fixed-absolute-time  interpolation  problem.  Figure  9 
is  a  block  diagram  of  the  Implementation  of  this  solution.  It  is  to  be 
noted  again  that  the  optimal  estimator  Includes  as  part  of  its  structure 
the  optimal  filter. 

At  this  point  the  major  estimation  problems  for  processes  with 
known  statistics  have  been  solved.  Thus,  the  construction  of  the 
elemental  estimators  of  Fig.  5  may  be  considered  to  be  complete.  The 
analysis  will  now  return  to  the  estimation  of  processes  with  unknown 
parameters. 


FIG.  9.  MODEL  OF  OPTIMAL  ABSOLUTE-TIME  INTERPOLATOR. 
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.  CALCULATION  OF  THE  WEIGHTING  COEFFICIENTS 


The  remaining  problem  that  must  be  solved  is  the  calculation  of  the 
weighting  coefficients  {PfaJ  Z  )  :  i  =  1,  2,  ...L).  In  some  sense  this 
represents  the  truly  interesting  part  of  the  analysis  since  it  is  a 
study  of  the  learning  function  of  the  optimal  adaptive  estimator.  The 
optimal  estimator  is  called  adaptive  since  its  structure  is  a  function 
of  the  incoming  data.  This  structure  changes  only  through  the  weighting 
coefficients  and,  consequently,  they  embody  the  learning  or  adaptive 
feature  of  the  estimator. 

By  Bayes1  rule, 

p(zt|a1)p(ai) 

-  •  <S1> 

£  P(ztia^)p(aj) 

j=l 

which  may  be  rewritten,  to  avoid  practical  computational  problems,  as 


Since  the  a  priori  probabilities  (P^)  :  i  =  1,  2,  ...L)  are  known 
constants,  knowledge  of  the  probability  densities  (p(zt|ai)  :  i  =  1,  2, 
...L)  will  suffice  to  evaluate  the  weighting  coefficients.  Because  the 
elemental  processes  are  gaussian,  they  are  described  by  the  multivariate 
gauss ian  density  function 

p(^}<\)  =  (m)“t^|K2.t(i)|"!t««p^J4[*t  -  »«t(i)]T  l£t(i)[Zt  -  Mt(i)]j 

i  =  1,  2,  ...L,  (5.3) 

where  t(i)  is  the  covariance  matrix  of  the  first  t  time  samples 
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of  the  ith  observable  process  (z^t)  :  t  =  1,  2,  ...}  and  M^(i)  is 
the  corresponding  mean-value  vector.  For  notational  convenience  it  will 
be  assumed  that  for  all  t  and  i,  M  (i)  =  0,  although  this  assump¬ 
tion  is  not  necessary  for  the  succeeding  analysis  to  apply. 

In  order  to  evaluate  the  weighting  coefficients  it  will  be  neces¬ 
sary  to  calculate  each  |K  (i)|  and  each  quadratic  form 
z;  K_  (i)  Z .  At  first  thought  it  would  appear  that  insurmountable 

t  4}t  t 

difficulties  will  be  encountered  as  time  progresses.  One  is  required 
to  take  the  determinant  of  a  matrix  of  ever-increasing  dimension  and 
also  to  invert  such  a  matrix.  Fortunately,  it  is  possible  to  avoid 
these  difficulties,  and  the  next  two  sections  will  present  the  required 
analyses.  Briefly,  the  result  is  that  the  implicit -markov  assumption 
on  the  elemental  processes  greatly  simplifies  the  calculation  of  these 
quantities. 

A.  EVALUATION  OF  THE  DETERMINANT 

The  evaluation  of  the  determinant  |Kj,.t(i)|  may  be  simplified  by 
relating  it  to  the  previously  required  determinant  |KZt_1(i)|,  Con¬ 
sider  the  following  equality  (which  is  true  by  the  definition  of  the 
conditional  probability  density). 

p(Zt)  =  Zt_1]p(Zt_1).  (5.4) 

Since  the  elemental  processes  are  gaussian,  p[z(t)|Zt_1]  is  a  gaussian 
density  with  mean  2(t|t-l)  and  variance  ?z(t|t-l).  Substitute  this 
density  and  the  appropriate  multivariate  gaussian  distributions  in  Bq.(5.4). 
Identification  of  like  coefficients  on  both  sides  of  this  version  of 
Eq.  (5.4)  yields 

I  KzjtC1)!  =  V(t|t  -  1)|KZ;t-l(i}l  1  *  2>  •••L-  (5.S) 

It  is  possible  to  avoid  evaluating  any  determinant  by  iterating  Eq.  (s.5). 
Thus, 

lS!t(i)|  *  Jfj .*ia(j|j  “ *-1.*.  • -L-  (#••) 
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Furthermore,  the  quantities  {cr^ 2 ( J|  J  -  l)  :  i  =  1,  2,  . ..L;  j  =  1,2,  ...) 
have  already  been  calculated  by  Eq.  (4.12)  in  order  to  implement  the  gain 
vectors  [Eq.  (4.13)]  needed  in  the  elemental  estimators. 

Intuitively,  the  significance  of  Eq.  (5.6)  is  given  by  the  following 
statement.  The  determinant  of  the  covariance  matrix  is  the  product  of 
the  variances  of  the  one-step  prediction  errors.  Thus,  the  matrix 
K_  .  (i)  will  be  invertible  if  and  only  if  at  each  sampling  instant  it  is 
impossible  to  predict  perfectly  the  next  value  of  the  process. 

The  determinant  1 (i)j  may  be  related  to  an  important  concept 
from  information  theory.  The  concept  is  that  of  the  average  information 
or  the  entropy  of  the  process  and  is  defined  as 

H(Zt)  =  -  Jp(Zt)  log  p(Zt)dZt.  (5-7) 

For  an  elemental  process,  substitution  of  the  appropriate  gaussian 
density  and  integration  gives 

H^(Z^)  =  /p  log  £  ( 2 Jte ) 1 1  Kjg . ^  ( i ) |  ]  i  =  l,2,  ...L.  (5.8) 

One  then  can  make  the  following  statement.  The  matrix  K  (i)  will  be 
invertible  if  the  entropy  of  the  gaussian  process  whose  covariance 
matrix  is  K_  . (i)  is  finite.  The  entropy  of  the  process  may  be 

A  I  C 

expressed  in  terms  of  the  one-step  prediction  variances,  as 

t 

Ht(zt)  =  t/2  log  (2ne)  +  ]/2  V  log  cr±a(j|  j  -  l)  i  =  1,  2,  ...L. 

J=1  (5.9) 

Similar  results  relating  entropy  to  the  one-step  prediction  variance 
have  been  obtained  previously  by  Elias  [Ref.  14],  Price  [Ref.  15],  and 
Gel 'f and  and  Yaglom  [Ref.  16]. 

B.  EVALUATION  OF  THE  QUADRATIC  FORM 

The  quadratic  form  z£  t  (i)zt  may  be  thought  of  as  the  sum  of 
the  squared-time  samples  of  a  scalar-valued  random  process 
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{w(t)  :  t  =  1,  2,  ...}.  I#  the  vector  Wt  is  defined  as 

W*  =  [w(l) ,  w(2),  w(t)] 


then 


2<  Kz-;  (l)zt  «  «t 


i  =  1,  2,  ...L 


(5.10) 


implies  that 


Wt  =  K^t  (i)Zt  i  -  1,  2,  ...L.  (5.11) 

The  matrix  K  .  (i)  is  known  as  the  bleaching  [Ref.  17]  or  whitening 

u  jt 

filter  for  the  ith  elemental  process,  and  it  may  be  either  the  symmetric 
or  causal  square  root  of  the  inverse  of  the  covariance  matrix.  The 
latter  interpretation  will  be  used  here  since  then  the  calculations  to 
be  performed  are  physically  realizable. 

If  the  vector  of  observations  is  actually  generated  by  the  ith 

elemental  process,  the  vector  will  be  white.  Thus,  one  desires  to 

find  in  each  elemental  estimator  a  process  that  is  white  when  the  esti¬ 
mator  matches  the  observed  process.  Fortunately,  it  is  possible  to  find 
such  a  process—  it  is  the  normalized  version  of  the  one-step  prediction 
error,  z(t|t  -  l).  Before  demonstrating  this  fact  it  will  be  helpful 
to  provide  the  following  definitions. 

The  one-step  prediction  error  of  the  ith  estimator  operating  on  the 
jth  elemental  process  is  denoted  z^tjt  -  l).  Therefore, 

z1J(t) t  -  1)  =  Zj(t)  -  ®tJ(t|t  -  1)  i,J  =  1,  2,  ...L, 

where  z  (t|t  -  l)  is  the  estimate  given  by  the  ith  elemental  estl- 
mat or  operating  on  data  from  the  jth  elemental  process.  Thus, 

2  (t|t  -  l)  is  in  r  (t  -  l),  the  space  spanned  by  the  jth  time  series. 
The  one-step  prediction  error  of  the  1th  estimator  operating  on  an  unspeci¬ 
fied  process  is  denoted  z^tjt  -  l).  Hence,  in  a  particular  example, 
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(t | t  -  l)  =  2ij(tjt  -  l)  for  some  J  =  1,  2,  ...L,  Just  ss  z(t)  =  Zj(t) 
for  the  same  integer  J. 

In  terms  of  the  above  notation,  the  matched  one-step  prediction-error 
processes  {z^  (t|t  -  1)  :  t  =  1,  2,  ...j  i  -  1.  2,  ...L)  can  be  shown 
to  have  independent  time  samples  by  use  of  the  projection  theorem. 

By  the  projection  theorem 

S  (t|t  -  l)  1  v  for  all  v  E  r^(t  -  l). 

Now  zu(t  “l|t  -  2)  €  Ti(t  -  l) 
since 


~zi±(t  -  1)  t  -  2)  =  zt(t  -  1)  -  Sit(t  -  l|t  -  2)  (5.12) 

and  z1(t  -  l)  €  T1(t  -  1)  and  2±1(t  -  l|t  -  2)  6  ^(t  -  2)  Cl^t  -  l). 

Likewise,  *11(j| j  -  l)  €  ( J )  for  all  positive  integers  j.  Further¬ 

more,  the  following  ordering  relation  among  the  linear  spaces  holds 


ri(i)  c  1^(2)  c  ...  rt(t  -  i)  c  rt(t). 

Therefore , 

*±i(j| J  -  l)  €  rt(t  -  l)  for  all  j<t 

and  zit(t|t  -  1)  1  iu(j|  J  -  1)  for  all  J<t 

and  for  all  positive  integers  t.  Consequently,  the  tlaie  series 
(zii(tjt  -  l)  :  t  =  1,  2,  ...)  has  independent  time  samples  with 
variance  * ( t ( t  -  l).  The  time  series  {w(t)  :  t  =  1,2,  ...)  is 

obtained  by 

«(t)  -  ^(tjt  -  l)  i&(t|t  -  l).  (5.13) 
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Hence,  the  quadratic  form  Kzt(i)  zt  may  be  obtained  by  squaring  the 
error  signal  z^(j|j  -  l),  normalizing,  and  accumulating  or  summing  over 
time.  The  procedure  is  illustrated  in  Fig.  9. 


C.  COMPLETE  WEIGHTING-COEFFICIENT  CALCULATOR 


For  simplicity  of  illustration,  the  complete  weighting-coefficient 
calculator  will  be  described  for  the  case  of  only  two  elemental  processes. 
The  analysis  may  be  extended  to  problems  with  more  elemental  processes 
simply  by  repeating  for  each  additional  process  the  appropriate  portions 
of  the  subsequent  calculations. 

For  the  dual  elemental  process  situation  Eq.  (5.2)  becomes 

_  n  “1 


P(ajzt) 


p(zt|a2)  P(a2) 
1  +  p(zjax)  p(a]7 


(5.14) 


p(a2|zt)  =  i-P(a1|zt). 


The  ratio  of  the  gaussian  densities  is  calculated  as  follows: 

-<ztia2>  K.“>iV  f  „ 

(5.15) 


zt  <,<2>zt  -  *1  KzU<1)z, 


Using  previous  results,  this  becomes 


p(zt]a2) 

p^zt!ai) 


f ft  'j 

|J=1  cr*(j|j  -  i)|  J  -  1)  2d*(j|j  -  l )J 


(5.16) 

Note  that  Eq.  (5.16)  avoids  a  potential  numerical  difficulty  of  Eq.  (5.15 
by  accumulating  term  by  term  the  difference  of  the  normalized,  squared, 
error  signals  rather  than  taking  the  difference  between  the  two  large 
quadratic  forms.  Further  note  that  the  implementation  of  the  exponential 
of  Eq.  (5.16)  needs  to  be  accurate  over  only  a  reasonable  dynamic  range. 


) 
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By  the  time  the  argument  of  the  exponential  function  becomes  very  large 
in  magnitude  the  weighting  coefficients  will  have  converged  very  close 
to  one  and  zero.  In  this  situation  any  errors  in  implementing  the 
exponential  function  will  have  negligible  effect  on  the  optimal  estimate. 
Consequently,  for  most  purposes  the  exponential  may  be  formed  by  an 
analog, diode  function  generator.  The  required  square  and  square-root 
operations  may  be  formed  in  the  same  fashion.  It  may  also  be  desirable 
to  form  the  inverse  operation  of  Eq.  (5.14)  by  a  function  generator 
rather  than  by  a  division  operation  on  a  digital  computer. 

Figure  10  represents  a  block  diagram  of  a  method  of  implementing 
Eq  .  (5.14).  The  input  signals  are  available  from  the  optimal  esti¬ 

mators.  The  variances  { cr*  (t  1 1  -  l)  :  t  =  1,  2,  . ..;  i  =  1,  2,  ...L) 
have  been  calculated  in  advance  to  construct  the  optimal  estimator. 

The  square  root  of  the  ratio  of  the  variances  may  also  be  calculated  in 
advance.  This  is  assumed  to  be  the  case  in  the  block  diagram.  The  out¬ 
put  signals,  which  are  the  values  of  the  weighting  coefficients,  either 
control  the  tap  positions  of  the  potentiometers  of  Fig.  5,  or  else  they 
and  the  set  of  conditional  estimates  are  processed  by  an  appropriate  set 
of  digital  multipliers. 

D.  CONVERGENCE  OF  THE  WEIGHTING  COEFFICIENTS 

Now  that  the  description  of  the  detailed  structure  of  the  optimal 
estimator  is  complete,  some  comments  about  its  performance  are  appropri¬ 
ate.  The  convergence  of  the  weighting  coefficients  is  of  particular 
interest  since  they  embody  the  adaptive  or  learning  feature  of  the 
optimal  estimator.  Because  of  the  eomplex  nature  of  the  problem,  it 
is  possible  to  give  only  a  sufficient  condition  for  the  convergence  of 
the  weighting  coefficients.  Because  of  the  analytical  complexity  of  the 
probability  distributions  involved,  it  is  not  feasible  to  obtain  an 
expression  for  the  rates  of  convergence. 

The  result  is  that,  if  all  the  elemental  stochastic  processes  are 
ergodic,  the  weighting  coefficients  will  converge  with  probability 
one  to  unity  for  the  coefficient  corresponding  to  the  true  process  and 
to  zero  for  the  others.  This  fact  stems  directly  from  Theorem  5.1  of 
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FIG.  10.  BLOCK  DIAGRAM  OF  WEIGHTING- COEFFICIENT 


Ref.  8,  which  is  stated  (with  notational  changes)  without  proof  below. 

This  theorem  represents  a  minor  modification  of  a  result,  given  by 
Loeve  [Ref.  18 1  which  is  derived  from  abstract  probability  theory. 

Theorem : 

If  there  exists  a  sequence  of  functions  {f1<t(Zt)  s  t  =  1,  2,  ...) 
of  the  learning  observations  (z^t)  :  t  =  1,  2,  . ..)  from  class  i 
such  that  tli®  fi-t^Zt^  is  equal  to  the  true  value  of  the  parameter 

with  probability  one,  then 

ti1*  ptojIV  -  8ji  J  -  x> 2’ 

with  probability  one. 

For  the  problem  considered  in  this  paper,  the  sequence  of  functions 
is  just  the  sample  covariance  matrix  and/or  the  sample  mean  value.  It 
is  well  known  [Ref.  9]  that  if  the  elemental  processes  are  ergodic  the 
sample  covariance  matrix  and/or  the  sample  mean  value  will  converge  with 
probability  one  to  the  true  covariance  matrix  and/or  mean  value,  i.e., 
true  parameter  vector  0^.  Thus,  ergodicity  of  the  elemental  processes 
is  sufficient  to  guarantee  that  the  optimal  estimator  will  converge  in 
the  limit  with  probability  one  to  the  appropriate  Wiener  filter. 

It  should  be  noted  that  ergodicity  of  the  elemental  processes  may 
not  be  necessary,  although  it  is  sufficient.  For  example,  the  elemental 
processes  may  be  nonergodic,  but  the  time-variant  changes  in  the  statis¬ 
tics  are  of  such  a  small  magnitude  that  convergence  occurs  anyway. 

If  the  elemental  processes  are  nonstationary  the  weighting  coeffi¬ 
cients  may  not  converge.  Even  in  this  case  it  should  be  recognized  that 
optimal  data  processing  is  being  performed  by  the  adaptive  estimator; 
convergence  of  the  weighting  coefficients  is  simply  precluded  by  the 
complicated  nature  of  the  problem  posed. 
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VI .  EXAMPLES 


Two  examples  that  have  been  chosen  for  their  simplicity  and  practi¬ 
cal  interest  are  evaluated  in  this  chapter.  The  first  example  deals  with 
a  filtering  problem  in  which  presence  of  the  message  is  a  random  variable. 
The  second  reverses  the  situation  and  considers  the  presence  of  a  portion 
of  the  additive  noise  as  a  random  variable.  Consequently,  in  both  cases, 
two  elemental  processes  suffice  to  describe  the  observed  process.  In 
both  cases  the  steady-state  performance  of  the  adaptive  filter  is  found 
to  be  significantly  better  than  that  of  a  conventional  filter. 


A.  EXAMPLE  A 

This  example  is  meant  to  represent  a  specific  case  of  the  random- 
message-presence  situation  described  in  Chapter  II  and  represented  in 
Fig.  2.  It  is  desired  to  perform  the  best  filtering  to  separate  the 
message  (if  present)  from  the  noise.  It  will  be  assumed  that  the  mes¬ 
sage  and  noise  processes  are  stationary  and  may  be  described  by  the 
following  difference  equations. 

0^  :  Message  present 

xx(t  +  l)  =  ♦  xx(t)  +  ux(t) 
x2(c  +  l)  =  +  u2(t) 

zx(t)  =  m1x1(t)  +  m2x2(t) 

a2  :  Message  absent 

xg(t  +  l)  =  +  u2(t) 

*2(t)  =  ®2*2(t) 


(6.1) 


(6.2) 


The  driving  forces  (u^(t)  :  t  =  -<»,  ...,  -1,  0,  1,  . ..»;  i  =  1,  2) 
are  independent, gauss i an  random  processes  with  independent  time  samples 
of  unity  variance. 

Four  possible  situations  might  exist  with  a  nonadaptive  filter.  The 
steady-state,  mean-square  errors  (MSE)  for  these  cases  are  defined  as 
follows : 
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A 

1. 

^1 

MSE  when 
absent . 

2. 

P2 

A 

MSE  when 
present . 

3. 

P3 

A 

MSE  when 
absent . 

4. 

P4 

A 

MSE  when 
present . 

It  is  assumed  that  the  nonadaptive  filter  is  designed  on  the  basis  of 
the  message  being  present;  consequently,  cases  1  and  3  will  never 
occur . 

The  steady-state,  mean-square  errors  for  both  the  nonadaptive  and 
adaptive  filters  can  now  be  evaluated  in  terms  of  the  above  p's. 

The  following  quantities  are  defined: 

=  steady-state,  mean-square  error  of  the  nonadaptive  filter 
which  assumes  the  message  is  present. 
a  =  steady-state,  mean-square  error  of  the  adaptive  filter. 

Then 


p^)  p2  +  p<a2>  p4 


(6.3) 


and,  since  P_  =  0, 


P(ax)  P2 


(6.4) 


The  percent  Improvement  I  of  the  adaptive  system  over  the  conventional 
filter  is 


+_  - 

I  =  - 2-  X  100 

*n 


1  + 


P(oO  P, 

P^  Pj 


-1 


X  100 


(6.5) 


Note  that 


B2  -  e4  *  ? 


(«.«) 


where  7  is  defined  as  the  steady-state,  mean-square  error  due  to  the 
distortion  of  the  message  by  the  optimal  filter.  Thus, 
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P(ax) 

p(^J 


(i 


jL 

Pa 


'f* 


X  100 


(6.7) 


The  steady-state,  mean-square  errors  and  y  may  be  evaluated 

using  the  theory  of  sampled-data  systems  [Ref.  19],  Due  to  previous 
use  of  the  symbol  z  In  this  text,  the  symbol  X  will  be  used  for  the 
discrete-time  complex  frequency  variable.  It  may  be  shown,  using  the 
theory  of  sampled-data  systems,  that 


[  i‘  •  ■oiiii-  i*'l)]yw  i 

■  HJ  £  •'w'i'.w  X 


(6.8) 

(6.9) 


where  «$yy(X)  and  ^nn(^)  are  the  discrete-frequency, power-spectral 
densities  of  the  message  and  noise  processes,  respectively.  The 
discrete-time  causal  Wiener  filter  is  found  to  be 


h(M  =  ~ —  / 


K-.W,  , 


(6.10) 


where  the  positive-  and  negative-sign  superscripts  denote  spectral 
factorization  operators  and  the  positive  subscript  denotes  an  operator 
that  selects  the  positive  (or  real)  time  component. 

For  the  model  of  the  stochastic  processes  described  by  Eqs.  ( 6 . 1 ) 
and  (6.2) 


a£(*  -  r)(X  -  r_1) 


*..(*)  = 


zz 


(a  -  ♦)(*  -  r1) 


(6.11) 


and 


'yyW  ~(i) 


M 


(A 


♦_1)(X 


(6.12) 


where  r,  ♦  <  1. 
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The  quantity  r  is  defined  as  the  solution  that  is  less  than  unity 
in  magnitude  of  the  equation 


-1 

r  +  r 


♦  + 


(6.13) 


Substitution  of  Eqs .  (6.1l)  and  (6.12)  in  Eq.  (6.10)  yields 


(6.14) 


By  using  the  above  results,  the  mean-square  errors  may  be  found  to 
be 


P4  =  mj  [m^r'1  -  *)]"3  [l  -  r*] 


(6.15) 


and 

m  3 

y  =  ^  [ (r3  +  c3  -  1)43  +  {(1  -  r3)c3  -  2(l  -  r3)c  +  1  -  r«) 

.  r_14  -  c3  +  2(l  -  r8)c  +  r3  -  l]  X  K#"1  -  <>)(r_1  -  r)(r  -  ♦) 

.  (r  -  r1)]-1.  (6-16) 

where 

• S  (£)‘ [,(r'1  - 

The  percent  improvement  I  will  be  evaluated  for  the  following 
numerical  example: 

4  —  j/g  1  ®  j  =  ^  ®  > 

P^)  =0.1,  and  P(aa)  =  0.9  . 
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In  this  case  the  percent  improvement  of  the  adaptive  system  over  the 
conventional  filter  (which  is  designed  on  the  basis  of  the  message  being 
present)  is 

I  =  72. 

This  represents  a  very  significant  achievement,  since  the  best  possible 
improvement  under  any  circumstance  is  100  percent. 

B.  EXAMPLE  B 

The  example  analyzed  in  this  section  is  a  particular  case  of  the 
random  jamming  situation  presented  in  Chapter  II  and  pictured  in  Fig.  3. 
It  is  desired  to  perform  the  best  filtering  to  separate  the  message  from 
the  receiver  noise  or  possibly  from  the  sum  of  the  receiver  noise  and 
an  independent  jamming  signal.  It  will  be  assumed  that  the  message, 
receiver  noise,  and  Jamming  processes  are  stationary  and  may  be  described 
by  the  following  difference  equations. 

0^  :  Jamming  absent 
xx(t  +  l)  =  ♦Xj(t)  +  UjCt) 

x2(t  +  1)  =  +  u2(t) 

zx(t)  =mlx1(t)  +  n»2*2(t)  (6.17) 

a  :  Jamming  present 
x1(t  +  l)  =  ^(t)  +  ux(t) 

x2(t  +  l)  =  +  u2(t) 

x3(t  +  l)  =  +  u3(t) 

*2(t)  =  m1x1(t)+  n>2x2(t)  +  n>3x3(t)  (6.18) 

The  driving  forces  (u^t)  :  t  =  ...,  -1,  0,  1,  ...oo;  i  =  1,  2,  3) 

are  independent  gaussian  random  processes  with  independent  time  samples 
of  unity  variance. 
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Again,  there  are  four  posaible  situations  that  might  occur  with  a 
nonadaptive  filter.  The  steady-state,  mean-square  errors  for  these 
cases  are  defined  as  follows : 

A 

1.  0j,  =  MSE  when  Jamming  present  but  filter  designed  for  jamming  absent 

2.  0  =  MSE  when  jamming  present  and  filter  designed  for  jamming  present 

3.  @3  =  MSE  when  jamming  absent  and  filter  designed  for  Jamming  absent 

4.  0^  =  MSE  when  jamming  absent  but  filter  designed  for  jamming  present 

It  is  assumed  that  the  nonadaptive  filter  is  designed  on  the  basis  of  the 
Jamming  being  absent;  consequently,  cases  2  and  4  will  never  arise. 

The  steady-state,  mean-square  errors  for  both  the  nonadaptive  filters 
can  now  be  evaluated  in  terms  of  the  above  0's. 

The  following  quantities  are  defined: 

Vr  =  steady-state,  mean-square  error  of  the  nonadaptive  filter  which 
assumes  no  jamming  is  present . 

va  =  steady-state,  mean-square  error  of  the  adaptive  filter. 

Then 

Vn  =  P(0t2)  ®1  +  PK)  e3  (6.19) 

and 

va  =  P(ax)  e3  +  p(a2)  ea.  (6.20) 

The  percent  improvement  I  of  the  adaptive  system  over  the  conventional 
filter  is 


I 


X  100 


r  n  p(oi> 

L«1  -  92  *  ^ 


x  100. 


(6.21) 
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Using  the  theory  of  ssmpled-dsta  systems 


•i-  Viij  I  f  •  (••“> 


where  i^(X)  and  A^(X)  are  the  power  spectral  densities  of 
nn  nn 

receiver  noise  and  receiver  noise  plus  Jamming  noise,  respectively. 
H^X)  and  Hg(X)  are  the  Wiener  filters  designed  on  assumptions  0^ 
and  dig,  respectively.  Further  manipulation  yields 


Sj  =  cama[l  -  ra]  1  +  Gg. 


(6.23) 


The  quantities  r  and  c  are  found  from  Eqs.  (6.13)  and  (6.16). 

For  the  stochastic  processes  described  by  Eqs.  (6.17)  and  (6.18), 


(x)  - .  iy\a 

” (x-rl)tt-») 


-  ■» 


(6.24) 

(6.25) 


and 


*•  m|  +  n3  =  ®2 


nn 


(6.26) 


Consequently,  the  power  spectral  densities  of  the  two  elemental  processes 
are  similar  in  form. 


<i>w 

*2^  -  r)(X  -  r"1) 

1 

♦ 

9 

i 

♦ 

i 

aa(x  -  ? )(x  -  t"1) 

rH 

♦ 

1 

£ 

♦ 

1 

<< 

II 

(6.27) 


The  zeros  of  the  latter  power  spectral  density  are  found  from  the  equa¬ 
tion 


T  +  T"1  *  ♦  +  ♦' 


t  £)]• 


(6.28) 
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Because  of  the  similarity  of  form  of  the  power  spectral  densities 
involved,  the  Wiener  filters  for  the  two  cases  differ  only  in  parameter 
values. 


(6.29) 


where 


_-lv,-l 

r  )] 


The  mean-square  error  8  may  be  decomposed  into  the  error  power  caused 

O 

by  the  noise  and  the  message  distortion  power  caused  by  the  Wiener  filter. 
Thus, 


03  =  m|  c2(l  -  r2)"1  +  7  (6.30) 

where  7  is  defined  by  Eq.  (6.16). 

The  remaining  mean-square  error  is  found  to  be 

02  =  a2  *2(l  -  r2)-1  +  7  (6.31) 

where  7  is  evaluated  from  Eq.  (6.16)  with  the  substitutions  r  =  r 
and  esc  being  made. 

The  percent  improvement  I  will  be  evaluated  for  the  following 
numerical  example: 


♦  =  ,  m^»  1 ,  m2  =  /^,  m3  =  >/l5.75 , 

p(a1)  =  10/11,  and  p(a3)  s  1/11. 
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In  this  case  the  percent  improvement  of  the  adaptive  system  over  the  con¬ 
ventional  filter  (which  is  designed  on  the  basis  of  the  Jamming  being 
absent)  is 

I  =  75.3. 

Because  of  the  low  likelihood  of  Jamming  occurring,  this  represents  a 
particularly  significant  achievement  for  the  adaptive  filter. 
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VII.  EXTENS ION  OF  RESULTS 


Processes  with  deterministic  mean-value  functions,  such  as  mentioned 

in  case  2  of  Chapter  I,  have  not  been  specifically  treated;  therefore 

the  analysis  derived  in  this  work  can  be  extended  in  a  simple  manner  to 

handle  this  situation.  Thus,  the  elemental  estimators  will  differ  in 

that  the  observed  process  {z(t)  :  t  =  1,  2,  ...}  will  first  have  its 

hypothesized  mean-value  functions  (z^t)  :  t  =  1,  2,  ...;  i  =  1,  2,  ... 

subtracted  off  to  obtain  the  zero-mean  processes  necessary  for  use  of  the 

theory  of  Chapter  IV.  The  best  estimate  (0(0^)  will  then  consist  of  the 

hypothesized  mean  value  of  tu,  00(0^),  plus  the  best  estimate 

of  the  zero-mean  component  cu  of  the  state  of  nature  cc.  Since  the 

ac 

mean-value  function  is  considered  to  be  deterministic,  it  may  be  thought 
of  as  being  generated  by  a  free,  dynamical  system  with  the  proper  initial 
conditions.  Consequently,  the  optimal  estimator  for  a  nonzero-mean 
process  will  include  a  model  of  the  mean-value  function  generator  as 
well  as  a  model  of  the  zero-mean  component  of  the  process. 

Similarly,  by  merely  allowing  the  input  distribution  matrix  D(t) 
to  be  identically  the  zero  matrix  for  t  =  1,  2,  ...  ,  it  is  possible  to 
handle  the  case  in  which  the  message  component  of  the  observable  process 
is  formed  by  the  proper  initial  conditions,  which  are  assumed  to  be 
gaussianly  distributed,  on  one  of  a  finite  number  of  possible  free, 
linear,  dynamic  systems  (l.e.,  case  3  of  Chapter  i).  Because  of  this 
condition  on  D(t),  after  a  sufficient  number  of  observations  j,  the 
covariance  matrix  P(t|t  -  l)  will  become  the  zero  matrix  for  t  >  J. 
This  means  that  the  state  of  the  system  has  been  learned  perfectly. 

Since  no  further  randomness  is  allowed  to  enter  the  system,  it  will  be 
possible  to  predict,  filter,  or  interpolate  the  process  perfectly  there¬ 
after  without  taking  any  subsequent  observations.  In  this  situation, 
expressions  for  the  gain  vectors,  e.g.,  Eq.  (4.15),  will  become  indetermi 
nate  forms.  Fortunately,  the  error  signal  (z(t|t  -  l)  :  t  =  J,  j+1,  ... 
will  be  identically  zero,  and  any  value  may  be  used  for  the  gain  vectors. 

In  both  these  cases  the  only  differences  are  in  the  details  of  the 
elemental  estimators.  The  weighting-coefficient  calculator  structure 
remains  the  same. 
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VIII.  CONCLUSIONS 


For  sampled,  scalar-valued,  observable,  gausslan,  random  processes, 
the  optimal  adaptive  estimate  Is  an  appropriately  weighted  summation  of 
conditional  estimates,  which  are  formed  by  a  set  of  elemental  estimators 
(linear  dynamic  systems).  The  weighting  coefficients  are  determined  by 
relatively  simple,  nonlinear  operations  on  the  observed  data. 

When  the  observed  process  also  possesses  the  lmplicit-markov 
property,  the  construction  of  the  optimal  adaptive  estimator  is  simpli¬ 
fied  in  two  major  aspects.  First,  the  calculation  of  the  weighting 
coefficients  is  facilitated  since  the  inversion  of  matrices  that  grow 
with  time  is  avoided;  also,  the  evaluation  of  the  determinants  of  these 
matrices  is  reduced  to  the  multiplication  of  appropriate  scalar-valued 
constants.  Second,  the  elemental  estimators  may  be  implemented  more 
readily  since  the  sufficient  statistic  remains  of  fixed  dimension  as  the 
amount  of  observed  data  increases.  Furthermore,  under  the  impliclt-markov 
assumption,  the  structure  of  an  elemental  estimator— whether  it  be  a 
predictor,  filter,  or  interpolator— can  be  derived  in  a  unified  approach 
by  the  introduction  of  the  concept  of  the  displaced  covariance  matrix. 

If  the  construction  of  the  optimal  estimator  is  to  be  feasible,  the 
unknown  parameter  vector  must  come  from  a  finite  set  of  known  parameter 
vectors  (perhaps  time-variant ) .  Fortunately,  many  engineering  problems 
may  be  adequately  represented  by  such  a  model.  The  optimal  adaptive 
estimator  is  feasible  to  implement  for  filtering  problems  when  the 
presence  of  either  the  message  or  the  jamming  process  is  uncertain.  The 
performance  of  an  adaptive  filter  is  significantly  better  than  that  of 
a  nonadaptlve  filter  for  both  of  these  cases. 

The  engineering  usefulness  of  the  optimal  adaptive  estimator  is 
enhanced  by  the  fact  that  this  estimator  is  applicable  to  an  important 
class  of  stochastic  control  problems.  For  linear-dynamic,  quadratic-cost, 
stochastic  control  problems,  the  optimal  control  law  is  a  linear  function 
of  the  optimal  estimate  of  the  state  vector  of  the  control  dynamics. 
Therefore,  when  the  observations  of  the  plant  (i.e.,  controlled  object) 
output  are  corrupted  by  a  gausslan  random  process  described  by  an  initial¬ 
ly  unknown  parameter  vector,  an  optimal  adaptive  estimator  is  used  in  the 
implementation  of  the  optimal  control  law. 
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IX.  RECOMMENDATIONS  FOR  FUTURE  WORK 


The  analysis  presented  in  this  investigation  could  be  extended  with 
resultant  complexity  to  handle  vector-valued  observable  processes.  Per¬ 
haps  a  more  significant  achievement  would  be  to  obtain  analogous  results 
for  continuous-time  processes.  Some  difficulties  in  calculating  the 
required  weighting  coefficients  might  arise  here  since  some  of  the  simple 
relations  for  determinants,  etc.  would  no  longer  exist. 

A  very  difficult  problem  occurs  when  the  parameter  vector  describing 
the  process  can  take  on  a  continuum  of  possible  values.  Since  at  present 
it  does  not  appear  to  be  feasible  to  construct  a  continuum  of  weighting 
coefficients  or  estimators,  consideration  should  be  given  to  various  sub- 
optimal  schemes.  One  possible  procedure  would  be  to  build  a  set  of  ele¬ 
mental  estimators  based  on  parameter  vectors  distributed  uniformly  or 
appropriately  throughout  the  space  A  of  possible  parameter  vectors.  Bach 
elemental  estimator  could  be  designed  on  the  basis  of  a  mean  parameter 
vector  with  a  large  enough  variance  that  the  set  of  mean  vectors  and 
their  variances  more  or  less  filled  the  space  A.  Thus,  it  would  be 
assumed  that  the  structure  of  the  process  could  not  be  learned,  any  more 
accurately  than  these  variances,  and  the  optimal  elemental  estimator 
would  be  constructed  as  described  by  Rauch  [Ref.  3].  Numerous  questions 
exist  about  the  accuracy  of  this  approach  and  the  convergence  of  the 
weighting  coefficients  under  these  circumstances. 

(hie  of  the  most  difficult  subjects  is  the  study  of  the  convergence 
of  the  weighting  coefficients  as  attested  by  the  fact  that  sufficient 
conditions  for  convergence  are  found  from  rather  abstract  and  advanced 
probability  theory.  Direct  analytical  approaches  to  this  problem  become 
hopelessly  complex.  Naturally,  determination  of  the  rate  of  convergence 
is  even  more  difficult.  Despite  these  difficulties,  both  the  conditions 
for  convergence  and  rate  of  convergence  are  subjects  worthy  of  further 
study  since  they  are  of  great  theoretical  and  practical  interest. 


81  - 


SEL-63-143 


APPENDIX  A.  BRIEF  SUMMARY  OF  HILBERT  SPACE  THEORY 


The  purpose  of  this  appendix  is  to  introduce  some  elementary  concepts 
of  Hilbert  space  theory  to  the  reader  who  may  be  unfamiliar  with  them. 
Since  random  variables  may  be  regarded  as  vectors  in  an  abstract  Hilbert 
space,  various  methods  from  the  theory  of  the  latter  subject  may  be 
applied  profitably  to  statistical  problems.  The  following  material 
closely  follows  the  approach  of  Parzen  [Ref.  12].  The  reader  who  is 
interested  in  a  more  thorough  and  rigorous  treatment  of  the  subject  is 
referred  to  the  above-mentioned  article. 

1 .  Definitions 

a.  Definition  1 

S  is  a  linear  vector  space  if  and  only  if  for  any  vectors  u  and 
v  in  S,  and  real  number  c,  there  exist  vectors  u+v  and  cu 
respectively  which  satisfy  the  usual  properties  of  addition  and  multi¬ 
plication.  There  must  also  exist  in  S  a  zero  vector,  denoted  0,  with 
the  natural  property  under  addition. 

b.  Definition  2 

S  is  an  inner  product  space  if  and  only  if  to  every  pair  of 
vectors  u  and  v  in  S  there  corresponds  a  real  number,  denoted 
(u,v),  which  is  called  the  inner  product  of  u  and  v.  The  inner  product 
must  possess  the  following  properties:  for  all  vectors  u,  v,  and 
w  in  S  and  for  every  real  number  c, 


i) 

(cu,v) 

c(u,v) 

ii) 

(u+v,  w)  = 

(u,w)  +  (v,w) 

ill) 

(u,v)  = 

(v.u) 

iv) 

(u.u)  > 

0  if  and  only  if 

c.  Definition  3 

The  norm  of  a  vector  u,  denoted  ||u||,  in  an  inner  product 
space  S  is  defined  as  follows: 

Ml  ■  («.«)*. 
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d.  Definition  4 


S  is  a  complete  metric  space  (under  the  previously  defined  norm) 

if  and  only  if  for  any  sequence  of  vectors  {ur}  in  S  such  that 

||u  -  u  ||  -»  0  as  m,  n  -*  »•  then  there  exists  a  vector  u  in  8 
in  n  2 

such  that  ||un  -  u||  -»  0  as  n  -•  oo. 

e.  Definition  5 

S  is  an  abstract  Hilbert  space  if  and  only  if  it  is  a  linear 
vector  space,  an  inner  product  space,  and  finally  a  complete  metric 
space . 

f.  Definition  6 

The  Hilbert  space,  denoted  by  T(t),  spanned  by  a  time  series 
(x(j)  :  j  *  1,  2,  ...t),  is  defined  to  consist  of  all  random  variables 
v  (perhaps  vector-valued)  that  are  linear  combinations  of  the  random 
variables  (s(j)  :  J  =  1,  2,  ...t). 

Inasmuch  as  random  variables,  e.g.,  u  and  v  (perhaps  vector¬ 
valued),  satisfy  the  properties  required  of  a  vector  or  point  in  a 
Hilbert  space  under  the  inner  product 

(u,  v)  £  E{uT.  v) 

the  projection  theorem  is  applicable  to  the  estimation  of  stochastic 
processes . 


2.  Projection  Theorem 

Let  S  be  an  abstract  Hilbert  space,  let  T  be  a  Hilbert  subspace 
of  S,  let  u>  be  a  vector  in  S,  and  let  &  be  a  vector  in  T.  A 
necessary  and  sufficient  condition  that  &  is  the  unique  vector  in  T 
satisfying 

2  2 

|jo>  -  all  =  min  ||a>  -  v||  (minimisation  property) 

v€f 


is  that 

(<u  -  a,v)  =  0  for  all  v€f  (orthogonality  property). 


The  vector  a  is  called  the  perpendicular  projection  of  a>  onto  f. 
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Proof 


The  proof  must  consist  of  three  parts.  The  equivalence  of  the 
minimization  and  the  orthogonality  properties,  the  -uniqueness, and  the 
existence  of  the  vector  &,  must  be  established. 

a.  Equivalence  of  minimization  and  orthogonality  properties 

2  3  2 

||u>  -  v |]  =  ||cju  -  &||  +  2 (co  -  a,  (b  -  v)  +  ||&  -  v ||  for  ver. 

Since  T  is  a  linear  space  it  contains  Si  -  v,  and  consequently 
(<o  -  -  v)  =  0.  Therefore, 

2  2 

lloi  -  v||  2  lico  -  ffl||  as  claimed. 

Suppose  there  exists  a  vector  v^ep  such  that  (a>  -  &, v^)  =  a  /  0. 
Then  for  some  real  number  b, 

||(U  -  a  -  bVjJI  =  llco  -  fiill  +  2(cd  -  a,  bVj)  +  b2||v1|| 

3  2 

=  ||a>  —  a||  +  2ba  +  b*||v1||  . 

By  suitable  choice  of  b  the  sum  of  the  last  two  terms  of  the  above 
equation  can  be  made  negative,  and  consequently  the  optimality  of  a 
can  be  contradicted. 

b.  Uniqueness 

The  uniqueness  of  a  may  be  readily  established  by  the  use  of 
properties  li)  and  lv}  of  definition  2. 

c .  Existence 
Let 

d  =  inf lmum  over  ||co  -  vj|  for  all  v€r . 

Let  {vn}  be  a  sequence  of  vectors  in  T  such  that  ||a>  -  vj|  -*  d 
as  n  -*  oo.  The  sequence  (vn)  is  a  Cauchy  sequence,  as  may  be  estab¬ 
lished  by  use  of  the  parallelogram  law  outlined  below. 
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For  any  vectors  x  and  y  in  S,  the  parallelogram  law  states 


II*  -  yl!  +  II*  +  yll  ■  *11*11  +  allyll  . 

Use  of  the  above  relation  yields  for  every  m  and  n 

Hvn  "  vml|a  =  !1(vn  "  "  (va  ’  “>11* 

=  a||vn-  tt>||  +  2||vm  -  co||  -  4 II  J4(vn  +  vm)  -  coll  . 

Since  Vlv  +  v  )  belongs  to  T,  it  follows  that 
'  ^  n  in 

II  Y£vn  +  vm)  -  coll  >  d* , 


and  that 

II' "  vm(|  s  2||vn  -  co||  +  2||vm  -  co||  -  4d*  . 

As  n  and  m  tend  to  infinity  the  right  side  of  the  above  inequality 
tends  to  zero.  Therefore,  {vr}  is  a  Cauchy  sequence  in  a  Hilbert 
space  and  consequently  converges  in  norm  to  some  limit  vector  v'  in  S. 
By  the  triangle  inequality  and  the  definition  of  d 

d  *  h  -  v'll  s  llv'  -  vjl  +  ||to  -  vn||. 

Since  ||v'  -  v  ||  -*  0  the  right-hand  side  of  the  above  Inequality  tends 
n 

to  d.  Therefore,  ||co  -  v'||  =  d  and  there  does  exist  a  vector  v' 

(now  identifiable  as  v»  *  2>)  in  T  satisfying  the  projection  theorem. 
The  proof  of  the  projection  theorem  is  now  complete. 
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Naval  Ordnance  Lab. 

Corona,  Calif. 

1  Attn:  Library 
1  B.  H.  Wleder,  483 

Conmndor,  USN  Air  Dev.  Ctr. 
Johnsvllle,  Pa. 

1  Attn:  NADC  Library 
1  AD-5 

Couwnder 

USN  Missile  Center 

Pt.  Mugu,  Calif. 

1  Attn:  N090S3 

Cosum nding  Officer 
U.S.  Amy  Research  Office 
Box  CM,  Duke  Station 
Durban,  N.C. 

3  Attn:  CRD-AA-XP 

Co  wind  tag  Oenoral 

U.S.  Amy  Materiel  Cowand 

Washington  8ft,  D.C. 

1  Attn:  ANCRD-DS-E 
1  AMCRD-RS-P1-E 


H.a,  ,  USAF  (AFtm-ml.s) 

The  Pentagon,  Washington  8S,  D.C. 
1  Attn:  Mr.  H.  Mulkey,  ttm  4D335 


Chief  of  Staff,  USAF 
Washington  35,  D.C. 

3  Attn:  AFORT-XR 

Hq.,  USAF 

Dir.  of  Science  and  Technology 
Electronics  Dlv. 

Washington  3S,  D.C. 

1  Attn:  AUMT-IL/CS,  HaJ.  I.H.  ..... 

Aeronautical  Systeus  Dlv, 
Wrlght-Patterson  AFB,  Ohio 
1  Attn:  Lt.  Col.  L.  M.  Butsch,  Jr. 
ASRMX-2 

1  A8RNE-3,  D.  R.  Moore 

1  ASRNR-33 

1  ASRNE-1,  Electronic  Res,  Br. 

Elec.  Tech.  Lab 

1  ASRNCF-3,  Electrowgnetlc 

and  Cow.  Lab 

3  ASNXR 

1  ASNXRR  (Library) 

6  ASRNX-92 

Couwndant 

AF  Institute  of  Technology 
Wrlght-Patterson  AFB.  Ohio 
1  Attn:  AFIT  (Library) 

Executive  Director 
AF  Office  of  Scientific  Res. 
Washington  3ft,  D.C. 

1  Attn:  8REE 

AFWL  (WLL) 

3  Klrtland  AFB,  New  Mexico 
Director 

Air  University  Library 
Maxwell  AFB,  Ala. 

1  Attn:  CR-4563 

Coumnder,  AF  Caubrldge  Res.  Labe 
ARDC,  L.  G.  Ihoscoa  Field 
Bedford,  Mass. 

1  Attn:  CRTOTT-3,  Electronics 

Mqs. ,  AF  Systew  Couwnd 
Andrews  AFB 
Washington  3ft,  D.C. 

1  Attn:  8CTAE 

Asst.  Secy,  of  Defense  (R  and  D) 

R  and  D  Board,  Dept,  of  Defense 
Washington  3ft,  D.C. 

1  Attn:  Tech.  Library 

Office  of  Director  of  Defense 
Dept,  of  Defense 
Washington  8ft,  D.  C. 

1  Attn.  Research  and  Engineering 

Institute  for  Defense  Analyses 
IBM  Connecticut  Ave. 

Washington  9,  D.C. 

1  Attn:  W.  B,  Bradley 


Dspartuent  of  the  Amy 
Office,  Chief  of  Res.  and  Dev. 
The  Pentagon 
Washington  8ft,  D.C. 

1  Attn:  Research  Support  Dlv. , 
Rn.  3D44S 


Office  of  the  Chief  of  Engineers 
Dept,  of  the  Amy 
Washington  3ft,  D.C. 

1  Attn:  Chief,  Library  Br. 
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Office  of  the  Asst. 
Washington  Eft,  D.C. 
(AX)  Postage*  Bldg. 


gecy.  of  Defense 
,  Rn  30DB4 


Defease  Couwaioatloos  Agency 
Dept,  of  Defense 
Washington  SB,  D.  C. 

1  Attn:  Code  1S1A,  Tech.  Library 

Advisory  Group  on  Electron  Devices 
34B  Broadway,  Bth  Floor  East 
New  York  13,  N.Y. 

3  Attn:  M.  Sullivan 

Advisory  Group  on  Reliability  of 
Electronic  Bqulpwnt 
Off loe  Asst,  ftecy.  of  Defense 
The  Pentagon 
1  Washington  SB,  D.C. 
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CosMndlng  Officer 
Diamond  Ordnance  Fu*»  tab* 

Washington  18,  O.C. 
t  Attn:  OHPTL  *90,  Or.  B.T.  Young 

Diamond  Ordnance  Fuse  Lab. 

U.S.  Ordnance  Corps 
Washington  35,  O.C. 

1  Altai  0HDTL-48O-63B 
Mr.  R.H.  Cemyn 

U.8.  Dept.  of  Comperes 
National  Burtau  of  Standards 
Bouldar  Labs 

Central  Radio  Propagation  Lab. 

1  Bouldar,  Colorado 
3  Attnt  Miss  J.V.  Lincoln,  Chlaf 

RVSS 

NSP,  Xnglnaarlng  a act ion 
1  Washington,  O.C. 

Information  Retrieval  Soot  ion 
Padaral  Aviation  Agency 
Washington,  O.C. 

1  Attnt  MS- 112,  Library  Branch 

DDC 

Cameron  Station 
Alexandria  4,  Va. 

SO  Attnt  TISlA 

If. S,  Coast  Guard 
1300  E.  Straat ,  N.W. 

Washington  25,  D.C. 

1  Attnt  EBE  Station  8-8 

Off lea  of  Tachnlcal  Ssrvlcss 
Oapt.  of  Coavarca 
1  Washington  88,  O.C. 

Dirac tor 

National  Security  Agency 
Port  Georgs  G.  Meads,  Md. 

1  Attnt  R42 

NASA,  Goddard  Space  Plight  Canter 
Oraanbelt,  Md. 

1  Attnt  Coda  611,  Dr.  O.H,  Ludvig 
1  Chlaf,  Data  Systems  Divisions 

NASA 

Office  of  Adv.  Baa.  h  Tech, 

Padaral  Offlaa  Bldg.  10-B 
•00  Indapandenca  Are. 

Washington,  O.C. 

1  Attnt  Mr.  Paul  Johnson 

Chlaf,  U.S.  Army  Security  Agency 
Arlington  Mall  Station 
a  Arlington  13,  Virginia 

SCHOOLS 

*U  of  Aberdeen 
Oapt.  of  Natural  Pfalloeophy 
Harlaohal  Callage 
Aberdeen,  foot  land 
1  Attnt  Mr.  R.V.  Jooes 

0  of  Art  SOM 

BB  Dept. 

IMoeen,  Aria. 

1  Attnt  B.L.  Walker 
1  D.J.  Hamilton 

*U  of  British  Columbia 
Vasoouvar  I,  Canada 
1  Attnt  Dr.  A.C,  Soudaek 


•No  AP  or  Classified  Reports. 


California  Institute  of  Technology 
Pasadena,  Calif. 

1  Attnt  Prof.  B.W.  Could 
1  Prof.  L.M.  Plaid,  BE  Oapt. 

1  0.  Bravorman,  EB  Dept. 

California  Inatltuta  of  Technology 
4*00  Oak  Orova  Drive 
Pasadena  3,  Calif. 

1  Attnt  Llbrsry,  Jat  Propulsion  Lab. 

U.  of  California 
Berkeley  4,  Calif, 

1  Attnt  Prof.  B.M.  Saunders,  II  Oapt. 

Or.  R.E.  Wakerllng, 

Radiation  Lab.  Info.  Olv. , 

Bldg.  30,  Ra.  101 

U  of  California 
Loa  Angelas  24,  Calif. 

1  Attnt  C.T,  Leondes,  Prof,  of 

Engineering,  Engineering  Dept. 

1  R.S.  Elliott,  Electromagnet lea 

Dlv. ,  Collage  of  Engineering 

U  of  California,  San  Diego 
School  of  Science  and  Engineering 
La  Jolla,  Calif. 

1  Attn:  Phyalos  Dept. 

Carnegie  Institute  of  Technology 
Schenley  Park 
Pittsburg  13,  Pa. 

1  Attnt  Dr.  E.H.  Williams,  EE  Dept. 

Case  Institute  of  Technology 
Engineering  Design  Center 
Cleveland  6,  Ohio 

1  Attnt  Dr.  J.B.  Keswick,  Director 
Cornell  U 

Cognitive  Systems  Research  Program 
Ithaca,  N.T. 

1  Attnt  P.  Rosenblatt,  Hollister  Hall 

Thayer  School  of  Engr. 

Dartmouth  Collage 
Hanover,  Nee  Hampshire 
1  Attnt  John  W.  strohbehn 
Aast.  Professor 

Drexel  Institute  of  Technology 
Philadelphia  4,  Pa. 

1  Attnt  F.B.  Haynes,  EE  Dept. 

U  of  Florlds 

Engineering  Bldg.,  Ra.  396 
Gelasvllle,  Fla. 

1  Attnt  M.J.  Wiggins,  EB  Oapt. 

Georgia  Institute  of  Technology 
Atlanta  19,  Oa. 

1  Attnt  Hrs.  J.B.  Crosland,  Librarian 
1  P.  blxoo,  Engr , Experiment  Station 

Harvard  U 
Pierce  Mall 
Cambridge  99,  Mass. 

1  Attnt  Deem  I.  Brooks,  Dlv  of  Bngr.  and 
Applied  Physics,  Ba.  317 
3  B.  Parkas,  Librarian,  Ba.  303A, 

Tech.  Beports  Collect lee 

0  of  Have ll 
Honolulu  14,  Hseall 

1  Attnt  Aeat.  Prof.  E.  Najita,  EX  Dept. 

U  of  Illinois 
Urbane,  Ill. 

1  Attnt  P.D.  Coleman,  EB  Bas.  Lab. 

1  W.  Perkins,  BS  Rea.  Lab. 

1  A.  Albert,  Tech. Ed. ,BB  Bee,  Lab. 

1  Library  Serials  Dspt. 

1  Prof .0. Albert, Coordinated  Sc  1.  Lab, 


•Instltuto  da  Pesqulsae  da  Mar  Inha 
Minlsterlo  da  Mar inha 
Rio  da  Janeiro 
1st a do  da  Ouanabara,  Brasil 
1  Attnt  Boberto  B.  da  Costa 

Johna  -Hopkins  U 
Charles  and  34th  St. 

Baltimore  IS,  Md. 

1  Attnt  Librarian,  Carlyle  Barton  Lab. 

Johna  Hopkins  U 
S631  Georgia  Ave. 

Silver  Spring,  Md. 

1  Attnt  N.H.  Choksy 
1  Mr.  A.W.  Nagy,  Applied 

Physics  Lab. 

Llnfleld  Research  Institute 
McMinnville,  Ore, 

1  Attnt  G.N.  Hlckok,  Director 

Marquette  University 
College  of  Engineering 
1818  W.  Wisconsin  Ave. 

Milwaukee  3,  Wls. 

1  Attn:  A.C.  Moeller,  EE  Dept. 

MIT 

Cambridge  99,  Mass. 

1  Attn:  Res.  Lab.  of  Elec.,  Doc.Rm. 

1  Miss  A.  Slle,  Llbn.Rm  4-344, 

LIR 

1  Mr.  J.E.  Ward,  Elec.Sys.Lab. 

M  I  T 

Lincoln  Laboratory 
P.0.  Box  73 

1  Attn:  Lexington  73,  Mass. 

1  Navy  Representative 

1  Dr.  W.l.  Welle 

1  Kenneth  L.  Jordon,  Jr. 

U  of  Michigan 
Ann  Arbor,  Mich. 

1  Attn:  Dir. ,  Cooley  Elec.  Labs. , 

N.  Campus 

1  Dr.  J.E.  Rowe, Elec. Phye. Lab. 

1  Comm.  Scl.Lab. ,180  Frieze  Bldg, 

U  of  Michigan 

Inatltuta  of  Science  and  Technology 
P.O.  Box  61S 
Ann  Arbor,  Mich. 

1  Attn:  Tech.  Documents  Service 
1  W.  Wolfe— 1R1A— 

U  of  Minnesota 
Institute  of  Technology 
Minneapolis  14,  Minn. 

1  Attn:  Prof.  A.  Van  dor  Zlal, 

BS  Dept. 

U  of  Nevada 

Collage  of  Engineering 
Reno,  Nev. 

1  Attnt  Dr.  R.A.  Manhart,  EB  Dept. 

Northeastern  U 
The  Dodge  Library 
Boston  19,  Mesa. 

1  Attat  Joyce  B.  Lunds,  Librarian 

Northwestern  U 
3433  Oaktoa  St. 

Evanston,  111, 

1  Attnt  W.8.  Toth  Aerial 

Measurements  Lab. 

U  of  Notre  Dome 
South  Bend,  Ind. 

1  Attnt  B.  Henry,  EE  Dept. 
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Ohio  State  U 
2034  Nlel  Avo. 

Columbus  10,  Ohio 

1  Attnt  Prof.  I.K.  Boone,  XI  Dopt. 

Oregon  State  U 
Corvallis,  Ore. 

1  Attnt  H.J.  Oor thuya,  KE  Dept. 

Polytechnic  Institute 
9S3  Jay  St. 

Brooklyn,  N.Y. 

I  Attnt  L.  Shew,  XX  Dept. 

Polyteehnle  Institute  of  Brooklyn 
Orsduste  Center,  Route  110 
Faruingdsle,  N.Y. 

1  Attnt  Librarian 

Purdue  U 
Lafayette,  Ind. 

1  Attnt  Library,  XI  Dept. 

Rensselaer  Polytechnic  Institute 
Troy,  N.Y. 

1  Attnt  Library,  8erials  Dept. 

*U  of  Saskatchewan 
College  of  Xnglneerlng 
Saskatoon,  Canada 
1  Attnt  Prof.  R.X.  Ludwig 

8yracuee  U 
Syracuse  10,  N.Y. 

1  Attnt  IX  Dept. 

•Uppsala  U 
Institute  of  Physics 
Uppsala,  Sweden 
1  Attnt  Dr.  P.A.  Tove 

U  of  Utah 

Salt  Lake  City,  Utah 
1  Attnt  R.W.  Orow,  XX  Dept. 

U  of  Virginia 
Charlottesville,  Va. 

1  Attnt  J.C.  Wylli«, Aldermen  Library 

U  of  Washington 
Seattle  5,  Wash. 

1  Attnt  A. I.  Harrison,  IX  Dept. 

Worchester  Polytechnic  Inst. 
Worchester,  Hass. 

1  Attnt  Dr.  H.N.  Newell 

Yale  U 

New  Haven,  Conn. 

1  Attnt  Sloano  Physics  Lab, 

1  XX  Dept. 

1  lhiahaa  Lab.  ,Xngr.  Library 

1NDU81RIKS 

Aveo  Corp. 

Res.  Lab. 

BBSS  Revere  leach  Parkway 
Xverett  41,  Haas. 

1  Attnt  Dr.  Gordon  Abell 

Argonae  National  Lab. 

•TOO  South  Cass 
Argonae,  Ill. 

1  Attnt  Dr.  O.C.  Slap a on 

Ado  Ira  1  Corp. 

BSOO  Cortland  St. 

Chicago  47,  ill. 

1  Attat  X.N.  Bobers ton,  Librarian 

Airborne  Insinuate  Lab. 

Ceoac  Road 

Deer  Park,  Long  Island,  N.Y. 

1  Attat  J.  Dyer,  Vice -Pres, A Tech. Dir, 

•No  AP  or  Classified  Reports. 


Aaperex  Corp. 

B30  Duffy  Ave. 

Hicksville,  Long  Island,  N.Y. 

1  Attnt  Proj.Xnglneer,  S.  Barbasso 

A u tone tics 

Dlv.  of  North  Aaerloan  Aviation,  Inc. 
•ISO  X.  Inperlal  Highway 
Downey,  Calif. 

1  Attnt  Tech.  Library  3040-3 

Bell  Telephone  Labe. 

Murray  Hill  Lab. 

Murray  Hill,  N.J. 

1  Attnt  Dr.  J.B.  Pierce 

1  Dr.  8.  Darlington 

1  Mr.  A.J.  Qrossaan 

Bell  Telephone  labs.,  Inc. 

Technical  Inforaatlon  Library 
Whlppany,  N.J. 

1  Attnt  Tech.  Repts.  Libra., 

Whlppany  Lab. 

•Central  Xlectronics  Xnglneerlng 
Research  Institute 
Pllanl,  Rajasthan,  India 
1  Attnt  On  P.  Oandhi  -  Vlat  0 NR /London 

Columbia  Radiation  Lab. 

53B  West  110th  St. 

1  New  York,  New  York 

Convalr  -  Ban  Diego 

Dlv.  of  General  Dynamics  Corp. 

Ban  Diego  IS,  Calif. 

1  Attnt  Xnglneerlng  Library 

Cook  Research  Labe. 

6401  W.  Oakton  St. 

1  Attnt  Morton  Orove,  Ill. 

Cornell  Aeronautical  Labs.,  Inc. 

4455  Genoeses 
Buffalo  SI,  N.Y. 

1  Attnt  Library 

Xltel-McCullough,  Inc. 

901  Industrial  Way 
Ban  Carlos,  Calif, 

1  Attnt  Research  Librarian 

Aran  Knight  Corp. 
last  Natick,  Mass. 

1  Attat  Library 

Fairchild  Semiconductor  Corp. 

4001  Junlporo  Berra  Blvd. 

Palo  Alto,  Calif. 

1  Attnt  Dr.  V.H.  Orlnloh 

General  Slectrle  Co. 

Defease  Xlectronics  Dlv. ,  ISD 
Cornell  University,  Ithaca,  N.Y. 

1  Attat  Library  -  Vlat  Goosander, 

ASD  W-P  APS,  Ohio,  ASH  WOW 
D.X.  Lewis 

Oeaerel  Slectrle  TWT  Products  See. 

•01  California  Ave. 

Palo  Alto,  Calif. 

1  Attat  Tech.  Library,  C.O.  Lob 

General  Xlectrlc  Co.  Rea.  Lab 
P.O.  Hen  10SS 
Schaectady,  N.Y. 

I  Attat  Dr.  P.H.  Lewis 
1  R.L.  Shuey,  Hgr.  Info. 

Studies  Sec. 

Oeaerel  Xlectrlc  Co. 

Xlectronics  Park 
Bldg.  3,  Rn.  143-1 
Syracuse,  N.Y. 

1  Attat  Dec.  Library,  Y.  Burke 


Ollf Ilian  Brothers 
ISIS  Venice  Blvd. 

Los  Angeles,  Calif. 

1  Attnt  Ingr.  Library 

The  Helllcraftera  Co. 

5th  and  Koetner  Ave. 

1  Attnt  Chicago  34,  Ill. 

Hewlett-Packard  Co. 

1501  page  Mill  Road 
1  Attnt  Palo  Alto,  Calif. 

Hughes  Aircraft 
Malibu  Beach,  Calif, 

1  Attnt  Mr.  Ians 

Hughes  Aircraft  Ct, 

Florence  at  Teals  St. 

Culver  City,  Calif. 

1  Attnt  Tech. Doc. Con. ,  Bldg  4, 

Am.  0046 

Hughes  Aircraft  Co. 

P.O.  Box  37B 
Newport  Beach,  Calif, 

1  Attnt  Library,  Semiconductor  Div. 

IBM,  Box  3 BO,  Boardman  Road 
Poughkeepsie,  N.Y. 

1  Attn:  J.C.  Logue,  Data  Systems  Dlv, 

IBM,  Poughkeepsie,  N.Y. 

1  Attnt  Product  Dev. Lab. ,K.M.  Davis 

IBM  ASD  and  Research  Library 
Monterey  and  Cottle  Roads 
San  Jose,  Calif. 

1  Attnt  Miss  M.  Griffin,  Bldg. 035 

ITT  Federal  Labe. 

500  Washington  Ave. 

Nutley  10,  N.J. 

1  Attnt  Mr.  X.  Mount,  Librarian 

Laboratory  for  Xlectronics,  Inc. 

1075  Commonwealth  Ave. 

Boston  15,  Mass. 

1  Attat  Library 

LXL,  Inc. 

75  Akron  St. 

Coplague,  Long  Island,  N.Y. 

1  Attnt  Mr.  R.S.  Mautser 

Lenkurt  Xlectrlc  Co. 

San  Carlos,  Calif. 

1  Attnt  M.L.  Waller,  Librarian 

Llbrasoope 

Dlv.  of  Oenoral  Precision,  Inc. 

SOS  Westers  Ave. 

Oleadale  1,  Calif. 

1  Attnt  Xngr.  Library 

Lockheed  Missiles  and  Space  Dlv. 

P.O.  Bow  504,  Bldg.  SS4 
Sunnyvale,  Calif. 

1  Attat  Dr.  W.M.  Harris,  Dept. 47-30 
O.W.  Price,  Dept.  47-33 


Helper,  Inc. 

3000  Arlington  Blvd. 

Falla  Church,  VS. 

1  Attat  Librarian 

Microwave  Associates,  lac. 
Nor t beset  Industrial  Park 
Burl lag ton,  Hass. 

1  Attat  X.  Horteueon 
1  Librarian 
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Mlorcwave  Electronics  Corp. 

40fl  Transport  ft. 

Palo  Alto,  Calif. 

1  Attat  S.P.  iUlsel 
M.C.  Lone 

Mlnneapolls-Honeywcll  Rafulator  Co. 
1177  Blue  Boron  Bird. 

Rlvlora  Boaeh,  Pla. 

1  Attat  Semiconductor  Products  Library 

Monsanto  Rosoarch  Corp . 

Station  B,  Box  I 
Dayton  7,  Ohio 
1  Attat  Mrs.  D.  Crabtroo 

Monsanto  Chemical  Co. 

•00  N.  Linbergh  Blvd. 

St.  Louis  SS,  Mo. 

1  Attat  Mr.  B.  Or ban,  Mgr.  Inorganic 
Dov. 

•Dir. ,  National  Physical  Lab. 

Mllsldc  Road 
Now  Delhi  12,  India 
1  Attn:  8.C.  Sharna  -  Via: 

ONR /London 

•Northern  I lee trie  Co. ,  lmtd. 

Research  and  Development  Labs. 

P.O.  Box  3511,  Station  "C" 

Ottawa,  Ontario,  Canada 
1  Attn:  J.F.  Tat lock 

Via:  ASD,  Foreign  Release 
Office 

W-p  AFB,  Ohio 

Mr.  J.  Trojrsl  (Am) 

Northronics 

Palo  Verdes  Research  Park 

•101  Crest  Road 

Palos  Verdes  Bstates,  Calif. 

1  Attn:  Tech.  Info.  Center 

Pacific  Sesilconductors,  Inc. 

145S0  So.  Aviation  Blvd. 

Lawndale,  Calif. 

1  Attn:  R.O.  North 

Philco  Corp. 

Tech.  Rep.  Division 
P.O.  Bon  4730 
Philadelphia  34,  Pa. 

1  Attn:  F.R.  Sherman,  Mgr.  Editor 

Philco  Corp. 

Jolly  and  Union  Meeting  Roads 
Blue  Bell,  Pa. 

1  Attn:  C.T.  McCoy 
1  Dr.  J.R.  Foldmeler 

Polarad  Sleetronles  Corp. 

43-30  Thirty-Fourth  St. 

Long  Island  City  1,  N.T. 

1  Attn:  A.H.  Soaaeascbeia 

Radio  Carp,  of  Amerloa 

RCA  Labe,,  David  Samoff  Rea.  Can. 

Prince too,  N.J. 

S  Atta:  Dr.  J.  Sklaasky 

BOA  Labe.,  Prince toe,  N.J. 

1  Atta:  I.  Johnson 

RCA,  Mleslle  Bloc,  and  Controls  Dspt. 
Woburn,  Maaa. 

1  Atta:  Library 

The  Read  Corp. 

1700  Mala  St. 

Santa  Meaioa,  Calif. 

1  Atta:  Re loa  J.  Waldron,  Librarian 
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Raytheon  Manufacturing  Co. 
Microwave  and  Power  Tube  Dlv. 
Burlington,  Mass. 

1  Attnt  Librarian,  Spencer  Lab. 

Raytheon  Manufacturing  Co. 

Res.  Dlv.,  3S  Seyon  St. 

Waltham,  Mass. 

1  Attn:  Dr.  H.  Stats 
1  Mrs.  M.  Bennett,  Librarian 

1  Research  Dlv.  Library 

Roger  White  Blectron  Devices,  Inc, 
Tall  Oaks  Road 

1  Laurel  Hedges,  Stanford,  Conn. 
Sand la  Corp. 

Sandla  Base,  Albuquerque,  N.M. 

1  Attnt  Mrs.  B.R.  Allen,  Librarian 

Sperry  Rand  Corp. 

8perry  Electron  Tube  Dlv. 

I  Gainesville,  Fla. 

Sperry  Gyroscope  Co. 

Dlv.  of  8perry  Rand  Corp. 

Oreat  Neck,  N.Y. 
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